Note: The two data files “bgg_db_2017_04.csv” and “bgg_clean_dat.csv” can be found on the github repo.
We all love Settlers of Catan, but what is it about Catan that makes it so addictive? There are many different components of a board game that give it a lot of variations. Different board games can have various maximum or minimum number of players, varying length of play time, different themes, mechanisms, designers or even different difficulty! We are curious to find out which of these attributes actually make a board game a good one, measured by player ratings. With data collected from boardgamegeek.com, we have player ratings on thousands of board games and their features. We are interested to do some exploratory analysis on the features and potentially build models to predict which board games are more likely to be loved by players.
We identified the following objectives for our project:
1. Investigate the possible traits of high-rating board games using ggplot2 package in R.
2. Build models to predict the success of a new board game and find important features of the successful games (defined by high average player rating).
3. Recommend board games to players based on certain specified criteria or other games that they love.
The first question we asked was:
What are the strongest predictors of a board game’s success (in terms of average player rating)?
We began to find in our EDA that some general variables, such as rank or number of votes, although very predictive of rating, are not useful since they are not features of board games that board game manufacturers or players can control.
Due to the above mentioned problem, our question then evolved to “what are some useful predictors of a board game’s success (in terms of rating)?” We found that there were several categories in game categories and game mechanics that were predictive of rating, and zoned in our models to develop a predictive model based on these categorical variables and some general variables, such as weight (game complexity) and year.
Other questions curious about include:
1. different stratifications in our dataset, such as games by age groups and single/multiplayer involvement.
2. How game preferences and characteristics changed over time (board game evolution). 3. Whether different machine learning methods would help us predict ratings better, and compared their RMSEs. 4. If we could use the variables we’ve gathered in our dataset to build a good board game recommender using Euclidean distances between board game features.
Our board game data comes from a data set on Kaggle.com. The link to the data set is:
[link] (https://www.kaggle.com/mrpantherson/board-game-data/data)
Our data cleaning work involves 3 parts:
1. Replace Wrong Values
2. Recode Numerical Values to Categorical Values
3. Hot-code Mechanics and Category Columns
We found several variables that need to be checked and replaced with appropriate values, including * min_players * max_players * weight * avg_time * min_time * max_time * year
Find all cells with value = 0 and cells with values not making sense. Search the original website to find information and overwrite them. Also recoding some continuos variables to categorical.
game <- read.csv("bgg_data.csv")
### Overwriting non-sense cells
# min_players
game$min_players[1616] <- 1; game$min_players[1962] <- 1; game$min_players[2273] <- 1
game$min_players[2408] <- 1; game$min_players[2693] <- 1; game$min_players[3332] <- 1
game$min_players[3850] <- 1; game$min_players[3902] <- 1; game$min_players[4145] <- 1
game$min_players[4247] <- 1; game$min_players[4311] <- 1; game$min_players[4442] <- 1
game$min_players[4480] <- 1; game$min_players[4815] <- 1
# max_players
game$max_players[1616] <- NA; game$max_players[1962] <- NA; game$max_players[2273] <- NA
game$max_players[2408] <- NA; game$max_players[2687] <- 2; game$max_players[2739] <- 4
game$max_players[2818] <- 2; game$max_players[3332] <- 1; game$max_players[3356] <- 4
game$max_players[3570] <- 2; game$max_players[3724] <- 2; game$max_players[3875] <- 2
game$max_players[3902] <- NA; game$max_players[4016] <- 2; game$max_players[4145] <- NA
game$max_players[4241] <- 2; game$max_players[4354] <- 2; game$max_players[4437] <- 2
game$max_players[4442] <- NA; game$max_players[4480] <- NA; game$max_players[4504] <- 2
game$max_players[4528] <- 2; game$max_players[4540] <- 2; game$max_players[4795] <- 2
game$max_players[4815] <- NA; game$max_players[4988] <- 2
# weight
game$weight[1477] <- NA; game$weight[4381] <- NA; game$weight[4521] <- NA
# min_time, avg_time and max_time are cleaned in excel and re-imported back
game <- read.csv("bgg_clean_dat.csv", sep = " ", header = T)
We tried to recode the following * min_players - to categories single player, multi-player or party game * max_players - to categories single player, multi-player or party game * min_time - to categories 0 (short), 1(medium), 2(long) * avg_time - to categories 0 (short), 1(medium), 2(long) * weight - to categories 0(easy), 1(medium), 2(hard)
# recode players
game$single_player = 0
game$single_player[game$min_players == 1] = 1
game$multi_player = 0
game$multi_player[game$min_players > 1 & game$max_players <= 4] = 1
game$party_player = 0
game$party_player[game$max_players > 4] = 1
# recode min_time
# 0 = short, 1 = medium, 2 = long
quantile(game$min_time, na.rm = T)
## 0% 25% 50% 75% 100%
## 1 30 45 90 17280
game$cate_mintime = 0
game$cate_mintime[game$min_time >= 30 & game$min_time <= 90] = 1
game$cate_mintime[game$min_time > 90] = 2
# recode avg_time
# 0 = short, 1 = medium, 2 = long
quantile(game$avg_time, na.rm = T)
## 0% 25% 50% 75% 100%
## 1 30 60 120 22500
game$cate_avgtime = 0
game$cate_avgtime[game$avg_time >= 30 & game$avg_time <= 120] = 1
game$cate_avgtime[game$avg_time > 120] = 2
# recode weight
# 0 = easy, 1 = medium, 2 = hard
quantile(game$weight, na.rm = T)
## 0% 25% 50% 75% 100%
## 1.00000 1.73885 2.28915 2.88890 4.90480
game$cate_weight = 0
game$cate_weight[game$weight >= 1.73885 & game$weight <= 2.8889] = 1
game$cate_weight[game$weight > 2.8889] = 2
# write final cleaned csv and import csv
write.table(game, "bgg_final_clean_dat.csv", sep = "|")
game <- read.csv("bgg_final_clean_dat.csv", sep = "|", header = T)
For each board game, it can have multiple mechanics or categories. We split the mechanics and categories into separate columns with each boradgame having 0 or 1 for each of the mechanic and category.
# recode mechanic
# find unique mechanics
mech_str <- paste(as.character(game$mechanic), collapse = ", ")
mech_unique <- unique(strsplit(mech_str, ", ")[[1]])
mech_unique_lower <- unlist(lapply(mech_unique, function(x) {paste(strsplit(tolower(x), " ")[[1]], collapse = "_")}))
# create one empty column for each unique mechanic
mechanic_col <- data.frame(matrix(0, ncol = length(mech_unique_lower), nrow = dim(game)[1]))
colnames(mechanic_col) <- mech_unique_lower
# fill in the values of the mechanic columns
fill_mech_col <- function(df, mechanic_col) {
for (i in 1:dim(df)[1]) {
mech_col_num <- which(mech_unique %in% c(strsplit(as.character(df$mechanic[i]), ", ")[[1]]))
for (j in mech_col_num) {
mechanic_col[i, j] <- 1
}
}
return(mechanic_col)
}
mechanic_col <- fill_mech_col(game, mechanic_col)
# recode categories
# find unique categories
cat_str <- paste(as.character(game$category), collapse = ", ")
cat_unique <- unique(strsplit(cat_str, ", ")[[1]])
cat_unique_lower <- unlist(lapply(cat_unique, function(x) {paste(strsplit(tolower(x), " ")[[1]], collapse = "_")}))
# create one empty column for each unique category
cat_col <- data.frame(matrix(0, ncol = length(cat_unique_lower), nrow = dim(game)[1]))
colnames(cat_col) <- cat_unique_lower
# fill in the values of the category columns
fill_cat_col <- function(df, cat_col) {
for (i in 1:dim(df)[1]) {
cat_col_num <- which(cat_unique %in% c(strsplit(as.character(df$category[i]), ", ")[[1]]))
for (j in cat_col_num) {
cat_col[i, j] <- 1
}
}
return(cat_col)
}
cat_col <- fill_cat_col(game, cat_col)
df_new <- cbind(game, mechanic_col)
write.table(df_new, 'df_w_mechanic', sep = "|")
df_new2 <- cbind(game, cat_col)
write.table(df_new2, 'df_w_cat', sep = "|")
df_w_mechanic <- read.csv('df_w_mechanic', sep = "|")
df_w_cat <- read.csv('df_w_cat', sep = "|")
drops <- c('none')
df_mech_new <- df_w_mechanic[ , !(names(df_w_mechanic) %in% drops)]
df_mech_new$memory_mechanic <- df_mech_new$memory
df_mech_final <- df_mech_new[ , !(names(df_mech_new) %in% 'memory')]
df_cat_new <- df_w_cat[ , !(names(df_w_cat) %in% drops)]
df_cat_final <- df_cat_new[, 27:109]
df_recode_final <- cbind(df_mech_final, df_cat_final)
write.table(df_recode_final, 'df_recode_final_1127', sep = "|")
game <- read.csv("df_recode_final_1127", sep = "|")
Figure 1.1: Avg rating vs geek rating across game ranks by categories
#loading game without separation of mech and cate into indicator variables
game1 <- read.csv("bgg_final_clean_dat.csv", sep = "|", header = T)
# Functions
split_into_multiple <- function(column, pattern = ", ", into_prefix){
cols <- str_split_fixed(column, pattern, n = Inf)
cols[which(cols == "")] <- NA
cols <- as_tibble(cols)
m <- dim(cols)[2]
names(cols) <- paste(into_prefix, 1:m, sep = "_")
return(cols)
}
#Splitting
game1 <- game1 %>% bind_cols(split_into_multiple(game1$category,',','category')) %>%
bind_cols(split_into_multiple(game1$mechanic,',','mechanic'))
#Cleaning
game1 <- game1 %>% select(-category, -mechanic, -designer, -image_url)
#Tidying
tidygame <- game1 %>% gather(key, categories, category_1:category_11, na.rm = TRUE) %>% select(-key) %>%
gather(key, mechanics, mechanic_1:mechanic_18, na.rm = TRUE) %>% select(-key)
tidygame$mechanics <- trimws(tidygame$mechanics)
tidygame$categories <- trimws(tidygame$categories)
#Categories vs ratings
tidygame %>%
group_by(categories) %>% summarize(avgrating = mean(avg_rating), avggeek = mean(geek_rating), avgrank = mean(rank)) %>%
ggplot() +
geom_point(aes(reorder(categories, avgrank), avggeek, color = 'avg_geek'), size = 0.5) +
geom_point(aes(categories, avgrating, color = 'avg_rating'), size = 0.5) +
scale_colour_manual(name="Rating", values=c(avg_geek="red", avg_rating="blue")) +
theme(axis.text=element_text(size=8, angle = 60, hjust =1)) +
ylab("Rating") +
xlab("Rank") +
ggtitle("Ratings by Categories")
Figure 1.2: Avg rating vs geek rating across game ranks by mechanics
#Mechanics vs ratings
tidygame %>% group_by(mechanics) %>% summarize(avgrating = mean(avg_rating), avggeek = mean(geek_rating), avgrank = mean(rank)) %>%
ggplot() +
geom_point(aes(reorder(mechanics, avgrank), avggeek, color = 'avg_geek'), size = 0.5) +
geom_point(aes(mechanics, avgrating, color = 'avg_rating'), size = 0.5) +
scale_colour_manual(name="Rating", values=c(avg_geek="red", avg_rating="blue")) + #adds legend
theme(axis.text=element_text(size=9, angle = 60, hjust = 1)) +
ylab("Rating") +
xlab("Rank") +
ggtitle("Ratings by Mechanics")
The plots above show that the geek ratings are lower than average ratings across all categories or mechanics in general.
Figure 2: Average Rating vs. Age
df_recode_final <- read.csv('df_recode_final_1127', sep = "|")
df_age <- df_recode_final %>%
filter(age <= 21 & age > 0)
df_age$age <- as.factor(df_age$age)
df_age %>%
ggplot() +
geom_boxplot(aes(x = age, y = avg_rating, col = age)) +
theme(legend.position="none") +
ggtitle("Game Rating by Age") +
ylab("Average Rating") +
xlab("Age")
We looked at average ratings across different minimum age groups. We found that board games tend to be rated lower for those catering to younger kids, and games with minimum recommended ages of 16-17 are the highest rated. This is probably because boardgamegeek.com users are generally at least teenagers so they prefer more challenging games appropriate to their age.
Generate agecat according to IQR
Then, we reported the top 10 ranked games for each age group and listed out their average ratings and geek ratings. The age groups were divided by interquartile ranges of 0-8, 9-12, and 13-21. We can clearly see that board games favored by each age group are different, and that games favored by average population and geeks are different in all 3 age groups.
Q1 <- quantile(game$age, 0.25)
Q3 <- quantile(game$age, 0.75)
game <- game %>% mutate(agecat = ifelse (age %in% range(0,Q1), 1, ifelse(age %in% range(Q1, Q3), 2, 3)))
Get the top 10 rated (based on avg_rating) for each age group, game_id
# agecat == 1 when age %in% range(0,Q1)
game %>%
select(avg_rating, geek_rating, rank, age, agecat, game_id, names) %>%
filter(agecat == 1) %>%
arrange(desc(avg_rating)) %>%
head(10) %>% kable
| avg_rating | geek_rating | rank | age | agecat | game_id | names |
|---|---|---|---|---|---|---|
| 8.74356 | 5.98863 | 2021 | 0 | 1 | 68820 | Enemy Action: Ardennes |
| 8.59714 | 5.68432 | 4007 | 0 | 1 | 39939 | The Battle of Fontenoy: 11 May, 1745 |
| 8.50652 | 5.66641 | 4257 | 0 | 1 | 223619 | Shadow War: Armageddon |
| 8.50000 | 5.63690 | 4786 | 0 | 1 | 193238 | Tunisia II |
| 8.49597 | 5.67612 | 4118 | 0 | 1 | 185380 | Exceed: Red Horizon ? Satoshi & Mei Lien vs. Baelkhor & Morathi |
| 8.46923 | 5.84305 | 2693 | 0 | 1 | 99358 | Stonewall Jackson’s Way II |
| 8.46140 | 5.78569 | 3069 | 0 | 1 | 149620 | Advanced Squad Leader: Starter Kit Historical Module 1 ? Decision at Elst |
| 8.44687 | 5.71536 | 3639 | 0 | 1 | 183578 | Wing Leader: Supremacy 1943-1945 |
| 8.42061 | 5.79244 | 3018 | 8 | 1 | 108018 | Riichi Mahjong |
| 8.41032 | 5.67778 | 4096 | 0 | 1 | 176596 | The Great Battles of Alexander: Macedonian Art of War |
# agecat == 2 when age %in% range(Q1,Q3)
game %>%
select(avg_rating, geek_rating, rank, age, agecat, game_id, names) %>%
filter(agecat == 2) %>%
arrange(desc(avg_rating)) %>%
head(10) %>% kable
| avg_rating | geek_rating | rank | age | agecat | game_id | names |
|---|---|---|---|---|---|---|
| 9.08970 | 8.15151 | 5 | 12 | 2 | 174430 | Gloomhaven |
| 8.91346 | 5.66261 | 4334 | 12 | 2 | 220308 | Gaia Project |
| 8.85597 | 6.48439 | 868 | 12 | 2 | 192135 | Too Many Bones |
| 8.77167 | 5.75141 | 3319 | 12 | 2 | 173504 | The Greatest Day: Sword, Juno, and Gold Beaches |
| 8.52372 | 5.71214 | 3672 | 12 | 2 | 199904 | Pericles: The Peloponnesian Wars |
| 8.45000 | 5.63642 | 4795 | 12 | 2 | 174298 | Napoleon’s Last Gamble |
| 8.43974 | 5.70077 | 3796 | 12 | 2 | 163399 | Infinity: Operation Icestorm |
| 8.41381 | 5.66988 | 4203 | 12 | 2 | 193867 | 1822: The Railways of Great Britain |
| 8.40513 | 6.94830 | 373 | 12 | 2 | 200680 | Agricola (revised edition) |
| 8.40132 | 5.95793 | 2142 | 12 | 2 | 32989 | Axis Empires: Totaler Krieg! |
# agecat == 3 when age %in% range(Q3,)
game %>%
select(avg_rating, geek_rating, rank, age, agecat, game_id, names) %>%
filter(agecat == 3) %>%
arrange(desc(avg_rating)) %>%
head(10) %>% kable
| avg_rating | geek_rating | rank | age | agecat | game_id | names |
|---|---|---|---|---|---|---|
| 9.33167 | 5.79078 | 3026 | 14 | 3 | 186751 | Mythic Battles: Pantheon |
| 9.14646 | 5.64691 | 4591 | 14 | 3 | 198985 | Day Night Z |
| 8.89899 | 7.28089 | 150 | 17 | 3 | 55690 | Kingdom Death: Monster |
| 8.85900 | 5.76596 | 3194 | 15 | 3 | 144574 | Last Chance for Victory |
| 8.82781 | 5.65702 | 4425 | 14 | 3 | 168537 | Pandemonium |
| 8.82278 | 5.70251 | 3771 | 13 | 3 | 178896 | Last Blitzkrieg |
| 8.72977 | 8.30744 | 2 | 14 | 3 | 182028 | Through the Ages: A New Story of Civilization |
| 8.71368 | 5.98688 | 2025 | 16 | 3 | 63170 | 1817 |
| 8.66905 | 8.48904 | 1 | 13 | 3 | 161936 | Pandemic Legacy: Season 1 |
| 8.60654 | 5.78242 | 3091 | 16 | 3 | 85424 | La Bataille de la Moscowa (third edition) |
Get the top 10 rated (based on avg_geek rating) for each age group, game_id.
# agecat == 1 when age %in% range(0,Q1)
game %>%
select(avg_rating, geek_rating, rank, age, agecat, game_id, names) %>%
filter(agecat == 1) %>%
arrange(desc(geek_rating)) %>%
head(10) %>% kable
| avg_rating | geek_rating | rank | age | agecat | game_id | names |
|---|---|---|---|---|---|---|
| 7.82454 | 7.69242 | 44 | 8 | 1 | 163412 | Patchwork |
| 8.05321 | 7.58639 | 61 | 8 | 1 | 194655 | Santorini |
| 7.67105 | 7.58506 | 62 | 8 | 1 | 30549 | Pandemic |
| 7.80825 | 7.57058 | 65 | 8 | 1 | 521 | Crokinole |
| 7.64090 | 7.50156 | 78 | 8 | 1 | 123260 | Suburbia |
| 7.59141 | 7.48291 | 84 | 8 | 1 | 14996 | Ticket to Ride: Europe |
| 7.66681 | 7.41349 | 102 | 8 | 1 | 31627 | Ticket to Ride: Nordic Countries |
| 7.67240 | 7.41118 | 104 | 8 | 1 | 188 | Go |
| 7.53092 | 7.39187 | 107 | 8 | 1 | 10630 | Memoir ’44 |
| 7.48190 | 7.38916 | 109 | 8 | 1 | 9209 | Ticket to Ride |
# agecat == 2 when age %in% range(Q1,Q3)
game %>%
select(avg_rating, geek_rating, rank, age, agecat, game_id, names) %>%
filter(agecat == 2) %>%
arrange(desc(geek_rating)) %>%
head(10) %>% kable
| avg_rating | geek_rating | rank | age | agecat | game_id | names |
|---|---|---|---|---|---|---|
| 8.29627 | 8.15458 | 4 | 12 | 2 | 120677 | Terra Mystica |
| 9.08970 | 8.15151 | 5 | 12 | 2 | 174430 | Gloomhaven |
| 8.37791 | 8.06267 | 8 | 12 | 2 | 167791 | Terraforming Mars |
| 8.17949 | 8.00663 | 10 | 12 | 2 | 102794 | Caverna: The Cave Farmers |
| 8.11355 | 7.99721 | 11 | 12 | 2 | 84876 | The Castles of Burgundy |
| 8.08780 | 7.98030 | 12 | 12 | 2 | 3076 | Puerto Rico |
| 8.05431 | 7.96041 | 14 | 12 | 2 | 31260 | Agricola |
| 8.08381 | 7.92190 | 17 | 12 | 2 | 25613 | Through the Ages: A Story of Civilization |
| 8.29052 | 7.86813 | 19 | 12 | 2 | 193738 | Great Western Trail |
| 7.94284 | 7.86145 | 21 | 12 | 2 | 2651 | Power Grid |
# agecat == 3 when age %in% range(Q3,)
game %>%
select(avg_rating, geek_rating, rank, age, agecat, game_id, names) %>%
filter(agecat == 3) %>%
arrange(desc(geek_rating)) %>%
head(10) %>% kable
| avg_rating | geek_rating | rank | age | agecat | game_id | names |
|---|---|---|---|---|---|---|
| 8.66905 | 8.48904 | 1 | 13 | 3 | 161936 | Pandemic Legacy: Season 1 |
| 8.72977 | 8.30744 | 2 | 14 | 3 | 182028 | Through the Ages: A New Story of Civilization |
| 8.35745 | 8.22021 | 3 | 13 | 3 | 12333 | Twilight Struggle |
| 8.53049 | 8.15037 | 6 | 14 | 3 | 187645 | Star Wars: Rebellion |
| 8.32419 | 8.08622 | 7 | 14 | 3 | 169786 | Scythe |
| 8.18761 | 8.02304 | 9 | 10 | 3 | 173346 | 7 Wonders Duel |
| 8.38607 | 7.96376 | 13 | 13 | 3 | 115746 | War of the Ring (Second Edition) |
| 8.13872 | 7.93931 | 15 | 14 | 3 | 96848 | Mage Knight Board Game |
| 8.14718 | 7.92347 | 16 | 14 | 3 | 170216 | Blood Rage |
| 8.19903 | 7.89284 | 18 | 14 | 3 | 164153 | Star Wars: Imperial Assault |
For each age group, get top 100 ranked games, and then find the top 10 rated (by freq) mechanics for each age group.
# agecat == 1 when age %in% range(0,Q1)
top100_1 <- game %>%
filter(agecat == 1) %>%
arrange(rank) %>%
mutate(new_rank = 1:n()) %>%
filter(new_rank <= 100)
mechanic <- top100_1[,27:76]
m1 <- mechanic %>% colSums() %>% sort(decreasing = T) %>% head(10)
# agecat == 2 when age %in% range(Q1,Q3)
top100_2 <- game %>%
filter(agecat == 2) %>%
arrange(rank) %>%
mutate(new_rank = 1:n()) %>%
filter(new_rank <= 100)
mechanic <- top100_2[,27:76]
m2 <- mechanic %>% colSums() %>% sort(decreasing = T) %>% head(10)
# agecat == 3 when age %in% range(Q3,)
top100_3 <- game %>%
filter(agecat == 3) %>%
arrange(rank) %>%
mutate(new_rank = 1:n()) %>%
filter(new_rank <= 100)
mechanic <- top100_3[,27:76]
m3 <- mechanic %>% colSums() %>% sort(decreasing = T) %>% head(10)
We plotted the top 10 ranked game mechanics for each age group. We realized that the three age groups have similar sets of preferred board game categories, as well as mechanics.
Figure 3.1: Top 10 Ranked Game Mechanics for each Age Group
m11 <- data.frame(names=names(m1), m1)
m22 <- data.frame(names=names(m2), m2)
m33 <- data.frame(names=names(m3), m3)
p1 <- m11 %>% mutate(freq = m1) %>%
ggplot () +
geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ffcccc") +
theme_light() +
theme(axis.text=element_text(size=8)) +
xlab('Age 0-8') +
ylab('Frequency') +
ylim(0,50)+
coord_flip()
p2 <- m22 %>% mutate(freq = m2) %>%
ggplot () +
geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ff9999") +
theme_light() +
theme(axis.text=element_text(size=8)) +
xlab('Age 9-12') +
ylab('Frequency') +
ylim(0,50)+
coord_flip()
p3 <- m33 %>% mutate(freq = m3) %>%
ggplot () +
geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ff4d4d") +
theme_light() +
theme(axis.text=element_text(size=8)) +
xlab('Age 13-21') +
ylab('Frequency') +
ylim(0,50)+
coord_flip()
grid.newpage()
grid.draw(rbind(ggplotGrob(p1), ggplotGrob(p2), ggplotGrob(p3),size = "last"))
Figure 3.2: Top 10 Ranked Game Categories for each Age Group
c1 <- top100_1[,78:160] %>% colSums() %>% sort(decreasing = T) %>% head(10)
c2 <- top100_2[,78:160] %>% colSums() %>% sort(decreasing = T) %>% head(10)
c3 <- top100_3[,78:160] %>% colSums() %>% sort(decreasing = T) %>% head(10)
c11 <- data.frame(names=names(c1), c1)
c22 <- data.frame(names=c(names(c2)[1:7], 'manufacturing', names(c2)[9:10]), c2)
c33 <- data.frame(names=names(c3), c3)
pc1 <- c11 %>% mutate(freq = c1) %>%
ggplot () +
geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ffcccc") +
theme(axis.text=element_text(size=8)) +
xlab('Age 0-8') +
ylab('Frequency') +
ylim(0,40)+
coord_flip()
pc2 <- c22 %>% mutate(freq = c2) %>%
ggplot () +
geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ff9999") +
theme(axis.text=element_text(size=8)) +
xlab('Age 9-12') +
ylab('Frequency') +
ylim(0,40)+
coord_flip()
pc3 <- c33 %>% mutate(freq = c3) %>%
ggplot () +
geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ff4d4d") +
theme(axis.text=element_text(size=8)) +
xlab('Age 13-21') +
ylab('Frequency') +
ylim(0,40)+
coord_flip()
grid.newpage()
grid.draw(rbind(ggplotGrob(pc1), ggplotGrob(pc2), ggplotGrob(pc3),size = "last"))
Firstly, we reported the top 10 ranked games for each player group a nd listed out their average ratings and geek ratings. Single-player, multiplayer, and party games are defined as those with 1 player, 2-4 players, and 4+ players respectively. We found that a large difference exist between the top-ranked games based on geek ratings or average ratings.
Top 10 rated (based on avg_rating), game_id
#single_player
game %>%
select(avg_rating, geek_rating, rank, single_player, game_id, names) %>%
filter(single_player == 1) %>%
arrange(desc(avg_rating)) %>%
head(10) %>% kable
| avg_rating | geek_rating | rank | single_player | game_id | names |
|---|---|---|---|---|---|
| 9.33167 | 5.79078 | 3026 | 1 | 186751 | Mythic Battles: Pantheon |
| 9.14646 | 5.64691 | 4591 | 1 | 198985 | Day Night Z |
| 9.08970 | 8.15151 | 5 | 1 | 174430 | Gloomhaven |
| 8.91346 | 5.66261 | 4334 | 1 | 220308 | Gaia Project |
| 8.89899 | 7.28089 | 150 | 1 | 55690 | Kingdom Death: Monster |
| 8.85597 | 6.48439 | 868 | 1 | 192135 | Too Many Bones |
| 8.82781 | 5.65702 | 4425 | 1 | 168537 | Pandemonium |
| 8.74356 | 5.98863 | 2021 | 1 | 68820 | Enemy Action: Ardennes |
| 8.60267 | 5.68950 | 3938 | 1 | 204472 | Sub Terra |
| 8.59714 | 5.68432 | 4007 | 1 | 39939 | The Battle of Fontenoy: 11 May, 1745 |
#multi_player
game %>%
select(avg_rating, geek_rating, rank, multi_player, game_id, names) %>%
filter(multi_player == 1) %>%
arrange(desc(avg_rating)) %>%
head(10) %>% kable
| avg_rating | geek_rating | rank | multi_player | game_id | names |
|---|---|---|---|---|---|
| 8.85900 | 5.76596 | 3194 | 1 | 144574 | Last Chance for Victory |
| 8.82278 | 5.70251 | 3771 | 1 | 178896 | Last Blitzkrieg |
| 8.72977 | 8.30744 | 2 | 1 | 182028 | Through the Ages: A New Story of Civilization |
| 8.66905 | 8.48904 | 1 | 1 | 161936 | Pandemic Legacy: Season 1 |
| 8.59789 | 5.66709 | 4241 | 1 | 163097 | Beyond the Rhine |
| 8.53049 | 8.15037 | 6 | 1 | 187645 | Star Wars: Rebellion |
| 8.50652 | 5.66641 | 4257 | 1 | 223619 | Shadow War: Armageddon |
| 8.50219 | 6.50294 | 839 | 1 | 179803 | Arcadia Quest: Inferno |
| 8.49597 | 5.67612 | 4118 | 1 | 185380 | Exceed: Red Horizon ? Satoshi & Mei Lien vs. Baelkhor & Morathi |
| 8.46140 | 5.78569 | 3069 | 1 | 149620 | Advanced Squad Leader: Starter Kit Historical Module 1 ? Decision at Elst |
#party_player
game %>%
select(avg_rating, geek_rating, rank, party_player, game_id, names) %>%
filter(party_player == 1) %>%
arrange(desc(avg_rating)) %>%
head(10) %>% kable
| avg_rating | geek_rating | rank | party_player | game_id | names |
|---|---|---|---|---|---|
| 8.89899 | 7.28089 | 150 | 1 | 55690 | Kingdom Death: Monster |
| 8.82781 | 5.65702 | 4425 | 1 | 168537 | Pandemonium |
| 8.77167 | 5.75141 | 3319 | 1 | 173504 | The Greatest Day: Sword, Juno, and Gold Beaches |
| 8.71368 | 5.98688 | 2025 | 1 | 63170 | 1817 |
| 8.60654 | 5.78242 | 3091 | 1 | 85424 | La Bataille de la Moscowa (third edition) |
| 8.60267 | 5.68950 | 3938 | 1 | 204472 | Sub Terra |
| 8.47239 | 5.62707 | 4979 | 1 | 178754 | Z War One: Damnation |
| 8.43849 | 6.43910 | 927 | 1 | 131111 | Codex: Card-Time Strategy ? Deluxe Set |
| 8.41381 | 5.66988 | 4203 | 1 | 193867 | 1822: The Railways of Great Britain |
| 8.38253 | 6.51317 | 823 | 1 | 184424 | Mega Civilization |
Top 10 rated (based on avg_geek rating) for each player group, game_id
#single_player
game %>%
select(avg_rating, geek_rating, rank, single_player, game_id, names) %>%
filter(single_player == 1) %>%
arrange(desc(geek_rating)) %>%
head(10) %>% kable
| avg_rating | geek_rating | rank | single_player | game_id | names |
|---|---|---|---|---|---|
| 9.08970 | 8.15151 | 5 | 1 | 174430 | Gloomhaven |
| 8.32419 | 8.08622 | 7 | 1 | 169786 | Scythe |
| 8.37791 | 8.06267 | 8 | 1 | 167791 | Terraforming Mars |
| 8.17949 | 8.00663 | 10 | 1 | 102794 | Caverna: The Cave Farmers |
| 8.05431 | 7.96041 | 14 | 1 | 31260 | Agricola |
| 8.13872 | 7.93931 | 15 | 1 | 96848 | Mage Knight Board Game |
| 8.30689 | 7.86329 | 20 | 1 | 205059 | Mansions of Madness: Second Edition |
| 8.39843 | 7.84010 | 24 | 1 | 205637 | Arkham Horror: The Card Game |
| 8.01272 | 7.83866 | 25 | 1 | 121921 | Robinson Crusoe: Adventures on the Cursed Island |
| 7.93099 | 7.80765 | 30 | 1 | 35677 | Le Havre |
#multi_player
game %>%
select(avg_rating, geek_rating, rank, multi_player, game_id, names) %>%
filter(multi_player == 1) %>%
arrange(desc(geek_rating)) %>%
head(10) %>% kable
| avg_rating | geek_rating | rank | multi_player | game_id | names |
|---|---|---|---|---|---|
| 8.66905 | 8.48904 | 1 | 1 | 161936 | Pandemic Legacy: Season 1 |
| 8.72977 | 8.30744 | 2 | 1 | 182028 | Through the Ages: A New Story of Civilization |
| 8.35745 | 8.22021 | 3 | 1 | 12333 | Twilight Struggle |
| 8.53049 | 8.15037 | 6 | 1 | 187645 | Star Wars: Rebellion |
| 8.18761 | 8.02304 | 9 | 1 | 173346 | 7 Wonders Duel |
| 8.11355 | 7.99721 | 11 | 1 | 84876 | The Castles of Burgundy |
| 8.38607 | 7.96376 | 13 | 1 | 115746 | War of the Ring (Second Edition) |
| 8.14718 | 7.92347 | 16 | 1 | 170216 | Blood Rage |
| 8.08381 | 7.92190 | 17 | 1 | 25613 | Through the Ages: A Story of Civilization |
| 8.29052 | 7.86813 | 19 | 1 | 193738 | Great Western Trail |
#party_player
game %>%
select(avg_rating, geek_rating, rank, party_player, game_id, names) %>%
filter(party_player == 1) %>%
arrange(desc(geek_rating)) %>%
head(10) %>% kable
| avg_rating | geek_rating | rank | party_player | game_id | names |
|---|---|---|---|---|---|
| 8.29627 | 8.15458 | 4 | 1 | 120677 | Terra Mystica |
| 8.32419 | 8.08622 | 7 | 1 | 169786 | Scythe |
| 8.37791 | 8.06267 | 8 | 1 | 167791 | Terraforming Mars |
| 8.17949 | 8.00663 | 10 | 1 | 102794 | Caverna: The Cave Farmers |
| 8.08780 | 7.98030 | 12 | 1 | 3076 | Puerto Rico |
| 8.05431 | 7.96041 | 14 | 1 | 31260 | Agricola |
| 8.19903 | 7.89284 | 18 | 1 | 164153 | Star Wars: Imperial Assault |
| 8.30689 | 7.86329 | 20 | 1 | 205059 | Mansions of Madness: Second Edition |
| 7.94284 | 7.86145 | 21 | 1 | 2651 | Power Grid |
| 8.00981 | 7.85384 | 23 | 1 | 72125 | Eclipse |
Then we plotted the top 10 ranked game mechanics for each player group. We realized that the three age groups have similar sets of preferred board game categories, as well as mechanics.
Figure 4.1: Plot the top 10 rated (avg_geek) for each player group, mechanic
#single-player
top100_1 <- game %>%
filter(single_player == 1) %>%
arrange(rank) %>%
mutate(new_rank = 1:n()) %>%
filter(new_rank <= 100)
mechanic <- top100_1[,27:76]
m1 <- mechanic %>% colSums() %>% sort(decreasing = T) %>% head(10)
# multi-player
top100_2 <- game %>%
filter(multi_player == 1) %>%
arrange(rank) %>%
mutate(new_rank = 1:n()) %>%
filter(new_rank <= 100)
mechanic <- top100_2[,27:76]
m2 <- mechanic %>% colSums() %>% sort(decreasing = T) %>% head(10)
# party-player
top100_3 <- game %>%
filter(party_player == 1) %>%
arrange(rank) %>%
mutate(new_rank = 1:n()) %>%
filter(new_rank <= 100)
mechanic <- top100_3[,27:76]
m3 <- mechanic %>% colSums() %>% sort(decreasing = T) %>% head(10)
# plot
m11 <- data.frame(names=names(m1), m1)
m22 <- data.frame(names=names(m2), m2)
m33 <- data.frame(names=names(m3), m3)
p1 <- m11 %>% mutate(freq = m1) %>%
ggplot () +
geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ffcccc") +
theme(axis.text=element_text(size=8)) +
xlab('single_player') +
ylab('Frequency') +
ylim(0,50)+
coord_flip()
p2 <- m22 %>% mutate(freq = m2) %>%
ggplot () +
geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ff9999") +
theme(axis.text=element_text(size=8)) +
xlab('multi_player') +
ylab('Frequency') +
ylim(0,50)+
coord_flip()
p3 <- m33 %>% mutate(freq = m3) %>%
ggplot () +
geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ff4d4d") +
theme(axis.text=element_text(size=8)) +
xlab('party_player') +
ylab('Frequency') +
ylim(0,50)+
coord_flip()
grid.newpage()
grid.draw(rbind(ggplotGrob(p1), ggplotGrob(p2), ggplotGrob(p3),size = "last"))
Figure 4.2: Plot the top 10 rated (avg_geek) for each player group, categories
c1 <- top100_1[,78:160] %>% colSums() %>% sort(decreasing = T) %>% head(10)
c2 <- top100_2[,78:160] %>% colSums() %>% sort(decreasing = T) %>% head(10)
c3 <- top100_3[,78:160] %>% colSums() %>% sort(decreasing = T) %>% head(10)
c11 <- data.frame(names=names(c1), c1)
c22 <- data.frame(names=names(c2), c2)
c33 <- data.frame(names=names(c3), c3)
pc1 <- c11 %>% mutate(freq = c1) %>%
ggplot () +
geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ffcccc") +
theme(axis.text=element_text(size=8)) +
xlab('single_player') +
ylab('Frequency') +
ylim(0,40) +
coord_flip()
pc2 <- c22 %>% mutate(freq = c2) %>%
ggplot () +
geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ff9999") +
theme(axis.text=element_text(size=8)) +
xlab('multi_player') +
ylab('Frequency') +
ylim(0,40) +
coord_flip()
pc3 <- c33 %>% mutate(freq = c3) %>%
ggplot () +
geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ff4d4d") +
theme(axis.text=element_text(size=8)) +
xlab('party_player') +
ylab('Frequency') +
ylim(0,40) +
coord_flip()
grid.newpage()
grid.draw(rbind(ggplotGrob(pc1), ggplotGrob(pc2), ggplotGrob(pc3),size = "last"))
Figure 5: Game Difficulty and Rating
# average rating and difficulty
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2")
df_recode_diff <- df_recode_final
df_recode_diff$min_time_category <- df_recode_final$cate_mintime
df_recode_diff %>%
ggplot() +
geom_point(aes(x = weight, y = avg_rating, col = factor(min_time_category))) +
xlab("Game Difficulty Level") +
ylab("Average Rating") +
ggtitle("Relationship of Game Difficulty and Rating") +
scale_color_manual("Game Length", values=cbPalette[1:3])
## Warning: Removed 3 rows containing missing values (geom_point).
We noticed that game rating increases as the difficulty level goes up, and that more difficult games need more time to play as expected.
Firstly, we calculated the beta Coefficients of Game Categories (linear model with categories variable only) and visualized them. From preliminary univariate linear regression of categories on average rating, error bars represent the 95% confidence interval of each coefficient estimate. We found that several game categories, such as Environmental, Medical, Farming, Civilization, are positively associated with average rating.
Secondly, we calculated the beta Coefficients of Game Mechanics (linear model with mechanics variable only) and visualized them. From a similar univariate linear regression of mechanics on average rating, such as Worker Placement, Grid Movement, Variable Phase Order, Role Playing, we see that several types of mechanics are positively associated with average rating.
Figure 6.1: Beta Coefficients of Game Categories (linear model with categories variable only)
model1 <- lm(geek_rating ~ categories,data = tidygame)
coeff <- as.data.frame(model1$coefficients)
coeff$category <- as.factor(rownames(coeff))
colnames(coeff) <- c("coef","covariates")
confint_cat <- as.data.frame(confint(model1))
coeff_cat <- coeff[,]
coeff_cat$covariates <- gsub("categories", "", coeff_cat$covariates)
head(coeff_cat)
## coef covariates
## (Intercept) 6.0280847 (Intercept)
## categoriesAction / Dexterity -0.0845758 Action / Dexterity
## categoriesAdventure 0.3181357 Adventure
## categoriesAge of Reason 0.1628377 Age of Reason
## categoriesAmerican Civil War -0.1357028 American Civil War
## categoriesAmerican Indian Wars 0.3046220 American Indian Wars
coeff_cat_error <- bind_cols(coeff_cat, confint_cat)
colnames(coeff_cat_error) <- c("coef", "covariates", "lower", "upper")
ggplot(coeff_cat_error[-1,], aes( x=reorder(covariates, coef)))+
geom_errorbar(aes(x =reorder(covariates, coef), ymin = lower, ymax = upper), color = "grey70") +
geom_point(aes(y = coef), col = "#C1275C") +
coord_flip() +
labs(title = "Linear Model Coefficients for Category",y="Coefficients",x="Category")+
scale_fill_gradient2(low = "light grey", mid = "grey70",
high = "#C1275C", midpoint = 0.25) +
theme_light() +
guides(fill=guide_legend(title="Coefficient values")) +
theme(axis.text=element_text(size=6))
** Figure 6.2: Beta Coefficients of Game Mechanics (linear model with mechanics variable only) **
model2 <- lm(geek_rating ~ mechanics, data = tidygame)
coeff <- as.data.frame(model2$coefficients)
coeff$category <- as.factor(rownames(coeff))
colnames(coeff) <- c("coef","covariates")
confint_mech <- as.data.frame(confint(model2))
coeff_mech <- coeff[,]
coeff_mech$covariates <- gsub("mechanics", "", coeff_mech$covariates)
coeff_mech_error <- bind_cols(coeff_mech, confint_mech)
colnames(coeff_mech_error) <- c("coef", "covariates", "lower", "upper")
ggplot(coeff_mech_error[-1,], aes( x=reorder(covariates, coef)))+
geom_errorbar(aes(x =reorder(covariates, coef), ymin = lower, ymax = upper), color = "grey70") +
geom_point(aes(y = coef), col = "#C1275C") +
coord_flip() +
labs(title = "Linear Model Coefficients for Mechanics",y="Coefficients",x="Mechanics")+
scale_fill_gradient2(low = "light grey", mid = "grey70",
high = "#C1275C", midpoint = 0.25) +
theme_light() +
guides(fill=guide_legend(title="Coefficient values")) +
theme(axis.text=element_text(size=8))
Firstly, we plotted the distribution of average rating and geek rating across years, we found that the average geek rating is always lower than the average rating, and that the two average ratings did not change much before 2015, but diverged after 2015, and that diverge may be due to low number of players in most recent board games.
Secondly, we plotted the distribution of game difficulty over the years. We noticed that the trend of difficulty in board games decreased from 1980 to 2017, and that the level of difficulty fluctuates across years, and that the most difficult board game in year 2018 is likely to be an outlier.
Then, we visualized the change in game popularity over the years and noticed that generally own more board games over the years, however there was a drop after year 2014.
Figure 1.1: The distribution of average rating and geek rating across years
game %>% group_by(year) %>%
filter(year > 1980 ) %>%
summarise(avg_mean = mean(avg_rating), geek_mean = mean(geek_rating)) %>%
ggplot()+
geom_point(aes(year, avg_mean, col = 'avg_mean'))+
geom_line(aes(year, avg_mean, col = 'avg_mean')) +
geom_point(aes(year, geek_mean, col = 'geek_mean'))+
geom_line(aes(year, geek_mean, col = 'geek_mean')) +
scale_colour_manual(name="rating", values=c(avg_mean="red",geek_mean ="blue")) +
ylab("rating") +
xlab("year") +
ggtitle('The distribution of average rating and geek rating across years') +
theme_grey()
Figure 1.2: Changing difficulty of board games across years
game %>% group_by(year) %>%
filter(year > 1980) %>%
summarise(m = mean(weight, na.rm = T)) %>%
ggplot(aes(year, m))+
geom_point(color = 'blue')+
geom_line(color = 'blue') +
ylab("Difficulty") +
xlab("year") +
ggtitle('Changing difficulty of board games across years')+
theme_grey()
Figure 1.3: The Distribution of Mean Rank of Board Games across Years
game %>% group_by(year) %>%
filter(year > 1980) %>%
summarise(m = mean(rank)) %>%
ggplot(aes(year, m))+
geom_point(color = 'blue')+
geom_line(color = 'blue') +
ylab("Mean Rank") +
xlab("year") +
ggtitle('the distribution of mean rank of board games across years')+
theme_grey()
Finally, we plotted the change in game mechanics and themes over the years 1980-2018.
For the mechanics, we found that hand management became one of the dominating mechanics in the past few decades, and dice rolling is a long-standing popular mechanic throughout the time.
For the themes(categories), we noticed that war games were really popular in the 80’s, but they became much less popular now; card games gradually gained a lot of popularity; a lot of fantasy games emerged in the last 10 years.
Figure 2.1: Change in game mechanics
# Change in game mechanics
# filter year to be on or after 1980
df_new1 <- game %>% filter(year >= 1980)
# group years into groups of 5-year intervals
df_new1$year_group <- cut(df_new1$year, breaks = c(1980, 1985, 1990, 1995, 2000, 2005, 2010, 2015, 2018), include.lowest = TRUE)
# get percentage of games with each top mechanic
dice_rolling <- df_new1 %>%
group_by(year_group) %>%
summarize(percent = sum(dice_rolling, na.rm = TRUE) / n()) %>%
mutate(mechanic = "dice_rolling")
hand_management <- df_new1 %>%
group_by(year_group) %>%
summarise(percent = sum(hand_management, na.rm = TRUE) / n()) %>%
mutate(mechanic = "hand_management")
variable_player_powers <- df_new1 %>%
group_by(year_group) %>%
summarise(percent = sum(variable_player_powers, na.rm = TRUE) / n()) %>%
mutate(mechanic = "variable_player_powers")
set_collection <- df_new1 %>%
group_by(year_group) %>%
summarise(percent = sum(set_collection, na.rm = TRUE) / n()) %>%
mutate(mechanic = "set_collection")
area_control_._area_influence <- df_new1 %>%
group_by(year_group) %>%
summarise(percent = sum(area_control_._area_influence, na.rm = TRUE) / n()) %>%
mutate(mechanic = "area_control_._area_influence")
card_drafting <- df_new1 %>%
group_by(year_group) %>%
summarise(percent = sum(card_drafting, na.rm = TRUE) / n()) %>%
mutate(mechanic = "card_drafting")
# plot
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2")
ggplot() +
geom_point(aes(x = dice_rolling$year_group, y = dice_rolling$percent, col = 'Dice Rolling')) +
geom_line(aes(x = dice_rolling$year_group, y = dice_rolling$percent, group = 1, col = 'Dice Rolling')) +
geom_point(aes(x = hand_management$year_group, y = hand_management$percent, col = 'Hand Management')) +
geom_line(aes(x = hand_management$year_group, y = hand_management$percent, group = 1, col = 'Hand Management')) +
geom_point(aes(x = variable_player_powers$year_group, y = variable_player_powers$percent, col = 'Variable Player Powers')) +
geom_line(aes(x = variable_player_powers$year_group, y = variable_player_powers$percent, group = 1, col = 'Variable Player Powers')) +
geom_point(aes(x = set_collection$year_group, y = set_collection$percent, col = 'Set Collection')) +
geom_line(aes(x = set_collection$year_group, y = set_collection$percent, group = 1, col = 'Set Collection')) +
geom_point(aes(x = area_control_._area_influence$year_group, y = area_control_._area_influence$percent, col = 'Area Control/Area Influence')) +
geom_line(aes(x = `area_control_._area_influence`$year_group, y = area_control_._area_influence$percent, group = 1, col = 'Area Control/Area Influence')) +
geom_point(aes(x = card_drafting$year_group, y = card_drafting$percent, col = 'Card Drafting')) +
geom_line(aes(x = card_drafting$year_group, y = card_drafting$percent, group = 1, col = 'Card Drafting')) +
scale_colour_manual("",
breaks = c("Dice Rolling", "Hand Management", "Variable Player Powers", "Set Collection", "Area Control/Area Influence", "Card Drafting"),
values = cbPalette[1:6]) +
scale_x_discrete(breaks = dice_rolling$year_group,
labels = seq(1980, 2015, 5)) +
xlab("Year") +
ylab("Percentage of games") +
ggtitle("Evolution of Game Mechanics 1980 - 2018") +
theme(legend.position="bottom")
Figure 2.2: Change in Game Categories
# change in game categories
# get percentage of games with top category
card_game <- df_new1 %>%
group_by(year_group) %>%
summarize(percent = sum(card_game, na.rm = TRUE) / n()) %>%
mutate(mechanic = "card_game")
wargame <- df_new1 %>%
group_by(year_group) %>%
summarise(percent = sum(wargame, na.rm = TRUE) / n()) %>%
mutate(mechanic = "wargame")
fantasy <- df_new1 %>%
group_by(year_group) %>%
summarise(percent = sum(fantasy, na.rm = TRUE) / n()) %>%
mutate(mechanic = "fantasy")
economic <- df_new1 %>%
group_by(year_group) %>%
summarise(percent = sum(economic, na.rm = TRUE) / n()) %>%
mutate(mechanic = "economic")
fighting <- df_new1 %>%
group_by(year_group) %>%
summarise(percent = sum(fighting, na.rm = TRUE) / n()) %>%
mutate(mechanic = "fighting")
science_fiction <- df_new1 %>%
group_by(year_group) %>%
summarise(percent = sum(science_fiction, na.rm = TRUE) / n()) %>%
mutate(mechanic = "science_fiction")
# plot
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2")
ggplot() +
geom_point(aes(x = card_game$year_group, y = card_game$percent, col = 'Card Game')) +
geom_line(aes(x = card_game$year_group, y = card_game$percent, group = 1, col = 'Card Game')) +
geom_point(aes(x = wargame$year_group, y = wargame$percent, col = 'War Game')) +
geom_line(aes(x = wargame$year_group, y = wargame$percent, group = 1, col = 'War Game')) +
geom_point(aes(x = fantasy$year_group, y = fantasy$percent, col = 'Fantasy')) +
geom_line(aes(x = fantasy$year_group, y = fantasy$percent, group = 1, col = 'Fantasy')) +
geom_point(aes(x = economic$year_group, y = economic$percent, col = 'Economic')) +
geom_line(aes(x = economic$year_group, y = economic$percent, group = 1, col = 'Economic')) +
geom_point(aes(x = fighting$year_group, y = fighting$percent, col = 'Fighting')) +
geom_line(aes(x = fighting$year_group, y = fighting$percent, group = 1, col = 'Fighting')) +
geom_point(aes(x = science_fiction$year_group, y = science_fiction$percent, col = 'Science Fiction')) +
geom_line(aes(x = science_fiction$year_group, y = science_fiction$percent, group = 1, col = 'Science Fiction')) +
scale_colour_manual("",
breaks = c("Card Game", "War Game", "Fantasy", "Economic", "Fighting", "Science Fiction"),
values = cbPalette[1:6]) +
scale_x_discrete(breaks = dice_rolling$year_group,
labels = seq(1980, 2015, 5)) +
xlab("Year") +
ylab("Percentage of games") +
ggtitle("Evolution of Game Categories 1980 - 2018") +
theme(legend.position="bottom")
We want to predict the success of a board game which is measured by its average rating on boardgamegeek.com.
We tried four different machine learning methods, including linear regression, kth nearest neighbors, random forest, and support vector machine.
Building train and test set
# import data
set.seed(1)
game <- read.csv("df_recode_final_1127", header =T, sep = "|")
# drop the irrelevant columns like game ID, names, designer
# drop na
g1 <- game[, -c(2:4, 14, 16, 18:19, 77)] #4750 152
g <- drop_na(g1) # 4666 152
# Spliting data as training and test set. Using createDataPartition() function from caret
inTrain <- createDataPartition(y = g$avg_rating,
p=0.8)$Resample
train_set <- slice(g, inTrain)
test_set <- slice(g, -inTrain)
control <- trainControl(method = 'cv', number = 20)
# Finding the best covariates to test with ML lm method
model <- train(avg_rating ~ .,
data = train_set,
method = "lm",
na.action=na.exclude,
trControl = control,
metric = "RMSE")
summary(model)
##
## Call:
## lm(formula = .outcome ~ ., data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.27256 -0.24178 -0.03963 0.20882 1.96684
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.705e+00 2.518e-01 10.742 < 2e-16 ***
## rank 6.808e-06 9.737e-06 0.699 0.484474
## min_players -6.909e-03 1.611e-02 -0.429 0.668119
## max_players 3.831e-04 3.639e-04 1.053 0.292549
## avg_time 5.154e-03 9.482e-04 5.436 5.80e-08 ***
## min_time -1.405e-04 4.138e-05 -3.396 0.000691 ***
## max_time -4.995e-03 9.474e-04 -5.272 1.42e-07 ***
## year 1.213e-05 4.854e-05 0.250 0.802641
## geek_rating 6.094e-01 3.502e-02 17.403 < 2e-16 ***
## num_votes 8.335e-06 9.630e-06 0.866 0.386792
## age -8.700e-03 2.174e-03 -4.002 6.41e-05 ***
## owned -2.012e-05 7.182e-06 -2.801 0.005122 **
## weight 2.533e-01 2.194e-02 11.544 < 2e-16 ***
## single_player 9.347e-02 3.751e-02 2.491 0.012765 *
## multi_player -1.773e-02 3.734e-02 -0.475 0.634888
## party_player -9.563e-02 3.476e-02 -2.751 0.005965 **
## cate_mintime -1.592e-01 1.964e-02 -8.109 6.85e-16 ***
## cate_avgtime 1.082e-01 2.107e-02 5.136 2.95e-07 ***
## cate_weight 5.263e-02 2.139e-02 2.461 0.013905 *
## action_point_allowance_system 5.804e-02 2.487e-02 2.333 0.019678 *
## co.operative_play 1.376e-01 3.014e-02 4.566 5.13e-06 ***
## hand_management -1.272e-03 1.650e-02 -0.077 0.938561
## point_to_point_movement 2.455e-03 2.915e-02 0.084 0.932898
## set_collection -1.373e-02 1.928e-02 -0.712 0.476529
## trading -4.876e-02 4.075e-02 -1.197 0.231534
## variable_player_powers 2.353e-02 2.038e-02 1.155 0.248347
## auction.bidding -7.090e-02 2.504e-02 -2.831 0.004661 **
## card_drafting 4.793e-03 2.073e-02 0.231 0.817175
## area_control_._area_influence -2.340e-02 2.132e-02 -1.098 0.272469
## campaign_._battle_card_driven 8.760e-02 3.535e-02 2.478 0.013243 *
## dice_rolling -2.046e-02 1.814e-02 -1.128 0.259401
## simultaneous_action_selection -2.242e-02 2.525e-02 -0.888 0.374627
## route.network_building 4.323e-02 3.625e-02 1.193 0.233127
## variable_phase_order 7.600e-02 3.916e-02 1.941 0.052367 .
## action_._movement_programming -7.939e-03 4.185e-02 -0.190 0.849565
## grid_movement 3.267e-02 2.859e-02 1.143 0.253225
## modular_board -5.434e-02 2.216e-02 -2.452 0.014242 *
## storytelling 2.242e-01 5.618e-02 3.991 6.71e-05 ***
## area_movement -4.554e-02 2.727e-02 -1.670 0.094975 .
## tile_placement -1.502e-02 2.259e-02 -0.665 0.506075
## worker_placement 3.595e-02 2.934e-02 1.225 0.220522
## deck_._pool_building 2.139e-01 3.070e-02 6.969 3.74e-12 ***
## role_playing -2.092e-02 4.174e-02 -0.501 0.616198
## partnerships 5.404e-03 2.847e-02 0.190 0.849489
## pick.up_and_deliver -1.036e-01 3.820e-02 -2.711 0.006737 **
## player_elimination 1.286e-04 3.648e-02 0.004 0.997188
## secret_unit_deployment -5.937e-02 3.627e-02 -1.637 0.101753
## pattern_recognition -1.134e-01 5.122e-02 -2.214 0.026898 *
## press_your_luck 4.000e-02 3.447e-02 1.161 0.245896
## time_track -9.216e-03 7.360e-02 -0.125 0.900351
## voting -3.797e-02 4.833e-02 -0.786 0.432125
## area.impulse 1.639e-01 8.581e-02 1.910 0.056214 .
## hex.and.counter 8.005e-02 3.210e-02 2.494 0.012669 *
## area_enclosure -1.205e-01 5.138e-02 -2.345 0.019068 *
## pattern_building -1.990e-02 3.974e-02 -0.501 0.616663
## take_that 3.401e-03 3.676e-02 0.093 0.926297
## stock_holding 1.673e-01 4.989e-02 3.355 0.000803 ***
## commodity_speculation -1.241e-01 4.833e-02 -2.569 0.010243 *
## simulation 6.762e-02 2.867e-02 2.358 0.018408 *
## betting.wagering 4.237e-02 4.783e-02 0.886 0.375721
## trick.taking -2.629e-02 4.899e-02 -0.537 0.591588
## line_drawing -7.568e-03 9.270e-02 -0.082 0.934936
## rock.paper.scissors -1.719e-01 6.937e-02 -2.478 0.013264 *
## roll_._spin_and_move -7.899e-02 4.396e-02 -1.797 0.072412 .
## paper.and.pencil -6.102e-02 6.226e-02 -0.980 0.327121
## acting 1.801e-01 7.277e-02 2.475 0.013369 *
## singing -2.967e-01 1.818e-01 -1.632 0.102865
## chit.pull_system 1.295e-01 5.422e-02 2.388 0.016992 *
## crayon_rail_system 9.567e-02 1.131e-01 0.846 0.397771
## environmental 3.035e-02 7.312e-02 0.415 0.678117
## medical 1.900e-01 9.794e-02 1.940 0.052419 .
## card_game -1.801e-03 1.879e-02 -0.096 0.923681
## civilization -1.029e-02 3.769e-02 -0.273 0.784892
## economic -6.635e-03 2.495e-02 -0.266 0.790260
## modern_warfare 4.369e-02 5.417e-02 0.807 0.419905
## political -1.013e-01 3.699e-02 -2.738 0.006214 **
## wargame 7.744e-02 3.000e-02 2.582 0.009872 **
## fantasy 1.601e-02 2.158e-02 0.742 0.458174
## territory_building 2.351e-02 3.387e-02 0.694 0.487658
## adventure -3.558e-02 3.316e-02 -1.073 0.283360
## exploration -3.738e-02 3.185e-02 -1.174 0.240543
## fighting 1.313e-02 2.407e-02 0.546 0.585407
## miniatures 1.836e-01 3.184e-02 5.768 8.69e-09 ***
## dice 1.824e-02 2.905e-02 0.628 0.530134
## movies_._tv_._radio_theme -1.576e-02 3.838e-02 -0.411 0.681389
## science_fiction -1.169e-02 2.652e-02 -0.441 0.659543
## industry_._manufacturing -3.819e-02 4.535e-02 -0.842 0.399805
## ancient -5.606e-02 2.968e-02 -1.889 0.058991 .
## city_building -6.181e-02 3.102e-02 -1.993 0.046374 *
## animals 6.399e-03 3.078e-02 0.208 0.835300
## farming -2.167e-02 5.415e-02 -0.400 0.689101
## medieval -6.132e-02 2.594e-02 -2.364 0.018139 *
## novel.based -1.247e-01 4.221e-02 -2.954 0.003152 **
## mythology 4.640e-02 4.222e-02 1.099 0.271929
## american_west -5.054e-02 5.332e-02 -0.948 0.343266
## horror -1.512e-02 3.901e-02 -0.388 0.698290
## murder.mystery -4.433e-02 6.008e-02 -0.738 0.460619
## puzzle 6.101e-02 4.791e-02 1.273 0.202962
## video_game_theme -6.683e-03 5.628e-02 -0.119 0.905484
## space_exploration -1.423e-01 5.594e-02 -2.544 0.011007 *
## collectible_components -7.883e-02 4.701e-02 -1.677 0.093652 .
## bluffing 1.968e-02 2.846e-02 0.691 0.489326
## transportation 1.035e-02 4.506e-02 0.230 0.818337
## religious -1.925e-02 7.223e-02 -0.266 0.789892
## travel -9.957e-02 6.844e-02 -1.455 0.145782
## nautical -7.166e-02 3.392e-02 -2.113 0.034691 *
## deduction 5.812e-02 3.611e-02 1.609 0.107597
## party_game 1.309e-01 3.531e-02 3.706 0.000214 ***
## spies.secret_agents 1.081e-01 5.738e-02 1.884 0.059596 .
## word_game 1.133e-02 5.979e-02 0.189 0.849718
## mature_._adult 1.205e-01 1.058e-01 1.139 0.254619
## renaissance -3.733e-02 3.986e-02 -0.937 0.349042
## zombies -3.100e-02 6.058e-02 -0.512 0.608934
## negotiation 7.657e-02 3.852e-02 1.988 0.046882 *
## abstract_strategy -3.715e-02 3.100e-02 -1.199 0.230738
## prehistoric -1.644e-01 6.706e-02 -2.451 0.014296 *
## arabian -4.630e-02 7.438e-02 -0.622 0.533667
## aviation_._flight -7.081e-02 4.747e-02 -1.492 0.135899
## post.napoleonic 1.059e-01 8.375e-02 1.264 0.206203
## trains 5.994e-02 4.935e-02 1.215 0.224583
## action_._dexterity 1.596e-01 3.857e-02 4.138 3.57e-05 ***
## world_war_i 1.256e-01 6.476e-02 1.939 0.052578 .
## world_war_ii 1.155e-01 3.509e-02 3.291 0.001008 **
## comic_book_._strip 6.079e-02 5.807e-02 1.047 0.295308
## racing -1.739e-02 3.862e-02 -0.450 0.652449
## real.time -7.946e-02 3.829e-02 -2.075 0.038029 *
## humor -6.121e-02 3.416e-02 -1.792 0.073199 .
## electronic -2.417e-02 8.020e-02 -0.301 0.763122
## book -4.749e-02 1.128e-01 -0.421 0.673667
## civil_war 2.128e-01 8.190e-02 2.599 0.009394 **
## expansion_for_base.game 4.463e-01 2.208e-01 2.021 0.043306 *
## sports 1.278e-01 4.407e-02 2.900 0.003747 **
## pirates -1.288e-02 5.023e-02 -0.256 0.797627
## age_of_reason -3.489e-03 6.563e-02 -0.053 0.957603
## american_indian_wars 1.035e-01 1.262e-01 0.820 0.412384
## american_revolutionary_war 1.552e-01 1.058e-01 1.467 0.142429
## educational 1.710e-01 5.793e-02 2.953 0.003171 **
## memory -9.770e-02 6.127e-02 -1.595 0.110902
## maze 4.133e-02 7.514e-02 0.550 0.582284
## napoleonic 1.364e-01 5.801e-02 2.352 0.018749 *
## print_._play 1.686e-01 4.668e-02 3.612 0.000307 ***
## american_civil_war 1.604e-01 5.633e-02 2.848 0.004428 **
## children.s_game -4.896e-02 3.990e-02 -1.227 0.219924
## vietnam_war -4.852e-02 1.247e-01 -0.389 0.697204
## pike_and_shot -6.680e-03 1.148e-01 -0.058 0.953599
## mafia -6.818e-02 7.214e-02 -0.945 0.344642
## trivia -1.173e-03 7.071e-02 -0.017 0.986765
## number 5.263e-02 9.444e-02 0.557 0.577371
## game_system 2.085e-01 1.127e-01 1.851 0.064228 .
## korean_war 9.008e-02 2.006e-01 0.449 0.653487
## music 8.991e-02 1.595e-01 0.564 0.573067
## math -9.626e-02 1.973e-01 -0.488 0.625635
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3773 on 3774 degrees of freedom
## Multiple R-squared: 0.5651, Adjusted R-squared: 0.5477
## F-statistic: 32.47 on 151 and 3774 DF, p-value: < 2.2e-16
#Removing NAs
tidygame <- train_set %>% select(avg_rating, year, weight , single_player , multi_player , hand_management , point_to_point_movement , set_collection , variable_player_powers , card_drafting , area_control_._area_influence , campaign_._battle_card_driven , dice_rolling , simultaneous_action_selection , route.network_building , variable_phase_order , grid_movement , storytelling , worker_placement , deck_._pool_building , player_elimination , press_your_luck , hex.and.counter , stock_holding , betting.wagering , line_drawing , rock.paper.scissors , environmental , card_game , economic , wargame , fighting , city_building , farming , murder.mystery) %>% na.omit(.)
#Stepwise selection
lm.null <- lm(avg_rating ~ 1, data = train_set)
lm.full <- lm(avg_rating ~ year + weight + single_player + multi_player + hand_management + point_to_point_movement + set_collection + variable_player_powers + card_drafting + area_control_._area_influence + campaign_._battle_card_driven + dice_rolling + simultaneous_action_selection + route.network_building + variable_phase_order + grid_movement + storytelling + worker_placement + deck_._pool_building + player_elimination + press_your_luck + hex.and.counter + stock_holding + betting.wagering + line_drawing + rock.paper.scissors + environmental + card_game + economic + wargame + fighting + city_building + farming + murder.mystery, data = train_set)
mod1 <- step(lm.null, direction = "both", scope = list(lower = lm.null, upper = lm.full))
## Start: AIC=-4538.23
## avg_rating ~ 1
##
## Df Sum of Sq RSS AIC
## + weight 1 356.71 878.41 -5874.2
## + wargame 1 83.30 1151.82 -4810.4
## + single_player 1 54.37 1180.75 -4713.0
## + hex.and.counter 1 53.00 1182.12 -4708.4
## + card_game 1 32.16 1202.96 -4639.8
## + variable_player_powers 1 30.72 1204.40 -4635.1
## + deck_._pool_building 1 23.52 1211.60 -4611.7
## + dice_rolling 1 22.84 1212.28 -4609.5
## + campaign_._battle_card_driven 1 21.58 1213.54 -4605.4
## + worker_placement 1 17.68 1217.44 -4592.8
## + economic 1 17.40 1217.72 -4591.9
## + variable_phase_order 1 14.86 1220.26 -4583.8
## + grid_movement 1 11.44 1223.68 -4572.8
## + set_collection 1 9.27 1225.85 -4565.8
## + fighting 1 8.24 1226.88 -4562.5
## + stock_holding 1 5.76 1229.36 -4554.6
## + multi_player 1 5.06 1230.06 -4552.4
## + point_to_point_movement 1 4.85 1230.27 -4551.7
## + route.network_building 1 3.90 1231.22 -4548.7
## + betting.wagering 1 3.44 1231.69 -4547.2
## + storytelling 1 3.24 1231.89 -4546.5
## + card_drafting 1 3.02 1232.10 -4545.9
## + area_control_._area_influence 1 2.59 1232.53 -4544.5
## + press_your_luck 1 2.10 1233.02 -4542.9
## + environmental 1 1.90 1233.23 -4542.3
## + rock.paper.scissors 1 1.61 1233.51 -4541.4
## + farming 1 0.75 1234.37 -4538.6
## <none> 1235.12 -4538.2
## + year 1 0.60 1234.52 -4538.1
## + line_drawing 1 0.51 1234.61 -4537.8
## + city_building 1 0.26 1234.86 -4537.1
## + simultaneous_action_selection 1 0.19 1234.93 -4536.8
## + hand_management 1 0.16 1234.96 -4536.8
## + murder.mystery 1 0.10 1235.03 -4536.5
## + player_elimination 1 0.01 1235.11 -4536.3
##
## Step: AIC=-5874.24
## avg_rating ~ weight
##
## Df Sum of Sq RSS AIC
## + single_player 1 25.08 853.33 -5986.0
## + deck_._pool_building 1 16.39 862.03 -5946.2
## + variable_player_powers 1 8.61 869.80 -5910.9
## + storytelling 1 7.23 871.19 -5904.7
## + campaign_._battle_card_driven 1 5.95 872.46 -5898.9
## + grid_movement 1 5.79 872.62 -5898.2
## + dice_rolling 1 3.39 875.02 -5887.4
## + card_drafting 1 2.71 875.71 -5884.4
## + fighting 1 2.28 876.13 -5882.5
## + player_elimination 1 1.95 876.46 -5881.0
## + hand_management 1 1.89 876.52 -5880.7
## + press_your_luck 1 1.82 876.60 -5880.4
## + wargame 1 1.69 876.72 -5879.8
## + rock.paper.scissors 1 1.58 876.83 -5879.3
## + variable_phase_order 1 1.50 876.92 -5878.9
## + area_control_._area_influence 1 1.31 877.11 -5878.1
## + city_building 1 0.88 877.53 -5876.2
## + economic 1 0.65 877.76 -5875.2
## + worker_placement 1 0.52 877.89 -5874.6
## <none> 878.41 -5874.2
## + murder.mystery 1 0.40 878.02 -5874.0
## + environmental 1 0.36 878.05 -5873.9
## + point_to_point_movement 1 0.35 878.07 -5873.8
## + set_collection 1 0.29 878.12 -5873.5
## + stock_holding 1 0.29 878.13 -5873.5
## + route.network_building 1 0.23 878.18 -5873.3
## + year 1 0.22 878.20 -5873.2
## + card_game 1 0.16 878.26 -5872.9
## + multi_player 1 0.13 878.29 -5872.8
## + farming 1 0.08 878.33 -5872.6
## + line_drawing 1 0.08 878.33 -5872.6
## + simultaneous_action_selection 1 0.05 878.37 -5872.4
## + hex.and.counter 1 0.01 878.41 -5872.3
## + betting.wagering 1 0.00 878.41 -5872.2
## - weight 1 356.71 1235.12 -4538.2
##
## Step: AIC=-5985.98
## avg_rating ~ weight + single_player
##
## Df Sum of Sq RSS AIC
## + deck_._pool_building 1 13.33 840.00 -6045.8
## + variable_player_powers 1 6.67 846.66 -6014.8
## + multi_player 1 6.54 846.79 -6014.2
## + grid_movement 1 5.58 847.75 -6009.7
## + storytelling 1 5.37 847.96 -6008.7
## + campaign_._battle_card_driven 1 4.75 848.58 -6005.9
## + card_drafting 1 2.66 850.67 -5996.2
## + hand_management 1 2.38 850.95 -5995.0
## + fighting 1 2.17 851.16 -5994.0
## + player_elimination 1 1.96 851.37 -5993.0
## + dice_rolling 1 1.88 851.45 -5992.6
## + rock.paper.scissors 1 1.80 851.53 -5992.3
## + variable_phase_order 1 1.62 851.71 -5991.4
## + wargame 1 1.34 851.99 -5990.2
## + press_your_luck 1 1.28 852.05 -5989.9
## + city_building 1 0.86 852.47 -5987.9
## + murder.mystery 1 0.44 852.89 -5986.0
## <none> 853.33 -5986.0
## + worker_placement 1 0.36 852.97 -5985.6
## + area_control_._area_influence 1 0.35 852.98 -5985.6
## + card_game 1 0.22 853.11 -5985.0
## + economic 1 0.22 853.12 -5985.0
## + point_to_point_movement 1 0.19 853.14 -5984.9
## + environmental 1 0.16 853.17 -5984.7
## + simultaneous_action_selection 1 0.09 853.24 -5984.4
## + year 1 0.09 853.24 -5984.4
## + betting.wagering 1 0.06 853.27 -5984.2
## + farming 1 0.05 853.28 -5984.2
## + set_collection 1 0.05 853.28 -5984.2
## + route.network_building 1 0.04 853.29 -5984.2
## + line_drawing 1 0.03 853.30 -5984.1
## + stock_holding 1 0.02 853.31 -5984.1
## + hex.and.counter 1 0.00 853.33 -5984.0
## - single_player 1 25.08 878.41 -5874.2
## - weight 1 327.42 1180.75 -4713.0
##
## Step: AIC=-6045.78
## avg_rating ~ weight + single_player + deck_._pool_building
##
## Df Sum of Sq RSS AIC
## + grid_movement 1 5.93 834.07 -6071.6
## + storytelling 1 5.76 834.24 -6070.8
## + variable_player_powers 1 5.66 834.34 -6070.3
## + multi_player 1 5.59 834.42 -6070.0
## + campaign_._battle_card_driven 1 5.40 834.61 -6069.1
## + wargame 1 2.65 837.35 -6056.2
## + dice_rolling 1 2.33 837.67 -6054.7
## + rock.paper.scissors 1 2.10 837.91 -6053.6
## + variable_phase_order 1 1.84 838.16 -6052.4
## + player_elimination 1 1.84 838.17 -6052.4
## + fighting 1 1.45 838.55 -6050.6
## + press_your_luck 1 1.33 838.67 -6050.0
## + hand_management 1 0.94 839.07 -6048.2
## + card_drafting 1 0.92 839.08 -6048.1
## + city_building 1 0.75 839.25 -6047.3
## + murder.mystery 1 0.46 839.54 -6045.9
## <none> 840.00 -6045.8
## + area_control_._area_influence 1 0.33 839.68 -6045.3
## + worker_placement 1 0.31 839.69 -6045.2
## + point_to_point_movement 1 0.23 839.78 -6044.8
## + environmental 1 0.20 839.80 -6044.7
## + simultaneous_action_selection 1 0.19 839.82 -6044.7
## + hex.and.counter 1 0.13 839.87 -6044.4
## + farming 1 0.12 839.89 -6044.3
## + economic 1 0.11 839.89 -6044.3
## + card_game 1 0.11 839.89 -6044.3
## + betting.wagering 1 0.11 839.89 -6044.3
## + year 1 0.04 839.96 -6044.0
## + line_drawing 1 0.03 839.97 -6043.9
## + set_collection 1 0.02 839.98 -6043.9
## + route.network_building 1 0.01 839.99 -6043.8
## + stock_holding 1 0.00 840.00 -6043.8
## - deck_._pool_building 1 13.33 853.33 -5986.0
## - single_player 1 22.03 862.03 -5946.2
## - weight 1 322.88 1162.88 -4770.9
##
## Step: AIC=-6071.59
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement
##
## Df Sum of Sq RSS AIC
## + storytelling 1 5.90 828.17 -6097.5
## + campaign_._battle_card_driven 1 5.63 828.44 -6096.2
## + multi_player 1 4.67 829.40 -6091.6
## + variable_player_powers 1 3.96 830.12 -6088.3
## + wargame 1 3.49 830.59 -6086.0
## + rock.paper.scissors 1 2.19 831.89 -6079.9
## + dice_rolling 1 1.84 832.23 -6078.3
## + variable_phase_order 1 1.83 832.24 -6078.2
## + player_elimination 1 1.54 832.53 -6076.9
## + press_your_luck 1 1.29 832.79 -6075.6
## + card_drafting 1 1.17 832.91 -6075.1
## + hand_management 1 1.05 833.02 -6074.5
## + fighting 1 0.72 833.35 -6073.0
## + city_building 1 0.65 833.43 -6072.6
## + murder.mystery 1 0.49 833.58 -6071.9
## + worker_placement 1 0.48 833.59 -6071.9
## <none> 834.07 -6071.6
## + hex.and.counter 1 0.35 833.72 -6071.2
## + point_to_point_movement 1 0.33 833.74 -6071.2
## + area_control_._area_influence 1 0.28 833.79 -6070.9
## + simultaneous_action_selection 1 0.22 833.85 -6070.6
## + environmental 1 0.15 833.92 -6070.3
## + betting.wagering 1 0.13 833.94 -6070.2
## + farming 1 0.11 833.96 -6070.1
## + year 1 0.07 834.00 -6069.9
## + economic 1 0.04 834.04 -6069.8
## + line_drawing 1 0.03 834.04 -6069.7
## + route.network_building 1 0.00 834.07 -6069.6
## + card_game 1 0.00 834.07 -6069.6
## + set_collection 1 0.00 834.07 -6069.6
## + stock_holding 1 0.00 834.07 -6069.6
## - grid_movement 1 5.93 840.00 -6045.8
## - deck_._pool_building 1 13.68 847.75 -6009.7
## - single_player 1 21.78 855.85 -5972.4
## - weight 1 317.60 1151.68 -4806.9
##
## Step: AIC=-6097.47
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement + storytelling
##
## Df Sum of Sq RSS AIC
## + campaign_._battle_card_driven 1 5.64 822.53 -6122.3
## + multi_player 1 5.28 822.89 -6120.6
## + wargame 1 3.97 824.20 -6114.4
## + variable_player_powers 1 3.64 824.53 -6112.8
## + rock.paper.scissors 1 2.10 826.07 -6105.5
## + variable_phase_order 1 1.92 826.25 -6104.6
## + dice_rolling 1 1.82 826.35 -6104.1
## + player_elimination 1 1.54 826.63 -6102.8
## + press_your_luck 1 1.48 826.69 -6102.5
## + hand_management 1 1.22 826.95 -6101.2
## + card_drafting 1 1.21 826.97 -6101.2
## + fighting 1 0.75 827.42 -6099.0
## + city_building 1 0.56 827.61 -6098.1
## + hex.and.counter 1 0.44 827.73 -6097.6
## + worker_placement 1 0.43 827.74 -6097.5
## <none> 828.17 -6097.5
## + point_to_point_movement 1 0.29 827.88 -6096.8
## + simultaneous_action_selection 1 0.21 827.96 -6096.5
## + area_control_._area_influence 1 0.21 827.96 -6096.5
## + environmental 1 0.18 827.99 -6096.3
## + betting.wagering 1 0.17 828.00 -6096.3
## + murder.mystery 1 0.14 828.03 -6096.2
## + farming 1 0.14 828.04 -6096.1
## + year 1 0.06 828.11 -6095.8
## + line_drawing 1 0.04 828.13 -6095.7
## + economic 1 0.02 828.15 -6095.6
## + card_game 1 0.01 828.16 -6095.5
## + set_collection 1 0.00 828.17 -6095.5
## + stock_holding 1 0.00 828.17 -6095.5
## + route.network_building 1 0.00 828.17 -6095.5
## - storytelling 1 5.90 834.07 -6071.6
## - grid_movement 1 6.07 834.24 -6070.8
## - deck_._pool_building 1 14.09 842.26 -6033.2
## - single_player 1 19.94 848.11 -6006.1
## - weight 1 321.41 1149.58 -4812.0
##
## Step: AIC=-6122.32
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement + storytelling + campaign_._battle_card_driven
##
## Df Sum of Sq RSS AIC
## + multi_player 1 4.781 817.75 -6143.2
## + variable_player_powers 1 3.495 819.03 -6137.0
## + rock.paper.scissors 1 2.295 820.23 -6131.3
## + wargame 1 2.146 820.38 -6130.6
## + variable_phase_order 1 1.716 820.81 -6128.5
## + press_your_luck 1 1.584 820.94 -6127.9
## + player_elimination 1 1.439 821.09 -6127.2
## + card_drafting 1 1.402 821.12 -6127.0
## + dice_rolling 1 1.349 821.18 -6126.8
## + hand_management 1 0.993 821.53 -6125.1
## + fighting 1 0.901 821.63 -6124.6
## + worker_placement 1 0.683 821.84 -6123.6
## + hex.and.counter 1 0.669 821.86 -6123.5
## <none> 822.53 -6122.3
## + city_building 1 0.400 822.13 -6122.2
## + area_control_._area_influence 1 0.304 822.22 -6121.8
## + environmental 1 0.229 822.30 -6121.4
## + simultaneous_action_selection 1 0.222 822.30 -6121.4
## + betting.wagering 1 0.202 822.33 -6121.3
## + farming 1 0.192 822.33 -6121.2
## + murder.mystery 1 0.156 822.37 -6121.1
## + line_drawing 1 0.052 822.48 -6120.6
## + year 1 0.045 822.48 -6120.5
## + set_collection 1 0.039 822.49 -6120.5
## + stock_holding 1 0.025 822.50 -6120.4
## + route.network_building 1 0.010 822.52 -6120.4
## + card_game 1 0.010 822.52 -6120.4
## + point_to_point_movement 1 0.005 822.52 -6120.3
## + economic 1 0.001 822.53 -6120.3
## - campaign_._battle_card_driven 1 5.644 828.17 -6097.5
## - storytelling 1 5.917 828.44 -6096.2
## - grid_movement 1 6.306 828.83 -6094.3
## - deck_._pool_building 1 14.776 837.30 -6054.4
## - single_player 1 18.715 841.24 -6036.0
## - weight 1 307.932 1130.46 -4875.9
##
## Step: AIC=-6143.21
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement + storytelling + campaign_._battle_card_driven +
## multi_player
##
## Df Sum of Sq RSS AIC
## + variable_player_powers 1 4.137 813.61 -6161.1
## + rock.paper.scissors 1 2.476 815.27 -6153.1
## + variable_phase_order 1 1.931 815.81 -6150.5
## + player_elimination 1 1.842 815.90 -6150.1
## + press_your_luck 1 1.787 815.96 -6149.8
## + card_drafting 1 1.498 816.25 -6148.4
## + wargame 1 1.304 816.44 -6147.5
## + dice_rolling 1 1.082 816.66 -6146.4
## + fighting 1 0.973 816.77 -6145.9
## + hand_management 1 0.875 816.87 -6145.4
## + worker_placement 1 0.742 817.00 -6144.8
## + city_building 1 0.426 817.32 -6143.3
## <none> 817.75 -6143.2
## + simultaneous_action_selection 1 0.366 817.38 -6143.0
## + betting.wagering 1 0.348 817.40 -6142.9
## + area_control_._area_influence 1 0.338 817.41 -6142.8
## + environmental 1 0.266 817.48 -6142.5
## + murder.mystery 1 0.257 817.49 -6142.4
## + hex.and.counter 1 0.248 817.50 -6142.4
## + farming 1 0.218 817.53 -6142.3
## + stock_holding 1 0.205 817.54 -6142.2
## + line_drawing 1 0.118 817.63 -6141.8
## + economic 1 0.110 817.64 -6141.7
## + year 1 0.069 817.68 -6141.5
## + route.network_building 1 0.067 817.68 -6141.5
## + set_collection 1 0.036 817.71 -6141.4
## + point_to_point_movement 1 0.002 817.74 -6141.2
## + card_game 1 0.000 817.75 -6141.2
## - multi_player 1 4.781 822.53 -6122.3
## - campaign_._battle_card_driven 1 5.141 822.89 -6120.6
## - grid_movement 1 5.346 823.09 -6119.6
## - storytelling 1 6.497 824.24 -6114.1
## - deck_._pool_building 1 13.796 831.54 -6079.5
## - single_player 1 23.422 841.17 -6034.3
## - weight 1 290.097 1107.84 -4953.2
##
## Step: AIC=-6161.12
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement + storytelling + campaign_._battle_card_driven +
## multi_player + variable_player_powers
##
## Df Sum of Sq RSS AIC
## + rock.paper.scissors 1 2.875 810.73 -6173.0
## + wargame 1 1.731 811.88 -6167.5
## + press_your_luck 1 1.687 811.92 -6167.3
## + variable_phase_order 1 1.560 812.05 -6166.7
## + card_drafting 1 1.503 812.11 -6166.4
## + player_elimination 1 1.282 812.33 -6165.3
## + worker_placement 1 0.825 812.78 -6163.1
## + hex.and.counter 1 0.618 812.99 -6162.1
## + hand_management 1 0.559 813.05 -6161.8
## + dice_rolling 1 0.494 813.12 -6161.5
## <none> 813.61 -6161.1
## + area_control_._area_influence 1 0.410 813.20 -6161.1
## + stock_holding 1 0.375 813.23 -6160.9
## + city_building 1 0.359 813.25 -6160.9
## + betting.wagering 1 0.341 813.27 -6160.8
## + farming 1 0.315 813.29 -6160.6
## + simultaneous_action_selection 1 0.299 813.31 -6160.6
## + economic 1 0.259 813.35 -6160.4
## + environmental 1 0.239 813.37 -6160.3
## + murder.mystery 1 0.169 813.44 -6159.9
## + line_drawing 1 0.162 813.45 -6159.9
## + route.network_building 1 0.151 813.46 -6159.8
## + fighting 1 0.096 813.51 -6159.6
## + set_collection 1 0.079 813.53 -6159.5
## + year 1 0.035 813.57 -6159.3
## + card_game 1 0.017 813.59 -6159.2
## + point_to_point_movement 1 0.005 813.60 -6159.1
## - grid_movement 1 3.651 817.26 -6145.5
## - variable_player_powers 1 4.137 817.75 -6143.2
## - campaign_._battle_card_driven 1 4.959 818.57 -6139.3
## - multi_player 1 5.423 819.03 -6137.0
## - storytelling 1 6.184 819.79 -6133.4
## - deck_._pool_building 1 12.749 826.36 -6102.1
## - single_player 1 22.854 836.46 -6054.4
## - weight 1 276.320 1089.93 -5015.2
##
## Step: AIC=-6173.01
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement + storytelling + campaign_._battle_card_driven +
## multi_player + variable_player_powers + rock.paper.scissors
##
## Df Sum of Sq RSS AIC
## + press_your_luck 1 1.672 809.06 -6179.1
## + wargame 1 1.604 809.13 -6178.8
## + variable_phase_order 1 1.556 809.18 -6178.6
## + card_drafting 1 1.388 809.35 -6177.7
## + player_elimination 1 1.318 809.42 -6177.4
## + worker_placement 1 0.768 809.97 -6174.7
## + simultaneous_action_selection 1 0.617 810.12 -6174.0
## + hand_management 1 0.608 810.13 -6174.0
## + hex.and.counter 1 0.581 810.15 -6173.8
## + area_control_._area_influence 1 0.449 810.29 -6173.2
## + dice_rolling 1 0.419 810.32 -6173.0
## <none> 810.73 -6173.0
## + stock_holding 1 0.411 810.32 -6173.0
## + city_building 1 0.366 810.37 -6172.8
## + betting.wagering 1 0.323 810.41 -6172.6
## + farming 1 0.303 810.43 -6172.5
## + environmental 1 0.226 810.51 -6172.1
## + economic 1 0.225 810.51 -6172.1
## + line_drawing 1 0.157 810.58 -6171.8
## + murder.mystery 1 0.155 810.58 -6171.8
## + fighting 1 0.149 810.59 -6171.7
## + route.network_building 1 0.136 810.60 -6171.7
## + set_collection 1 0.065 810.67 -6171.3
## + year 1 0.036 810.70 -6171.2
## + card_game 1 0.016 810.72 -6171.1
## + point_to_point_movement 1 0.004 810.73 -6171.0
## - rock.paper.scissors 1 2.875 813.61 -6161.1
## - grid_movement 1 3.651 814.38 -6157.4
## - variable_player_powers 1 4.536 815.27 -6153.1
## - campaign_._battle_card_driven 1 5.142 815.88 -6150.2
## - multi_player 1 5.663 816.40 -6147.7
## - storytelling 1 6.085 816.82 -6145.7
## - deck_._pool_building 1 13.035 823.77 -6112.4
## - single_player 1 23.197 833.93 -6064.3
## - weight 1 275.123 1085.86 -5027.9
##
## Step: AIC=-6179.12
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement + storytelling + campaign_._battle_card_driven +
## multi_player + variable_player_powers + rock.paper.scissors +
## press_your_luck
##
## Df Sum of Sq RSS AIC
## + wargame 1 1.686 807.38 -6185.3
## + variable_phase_order 1 1.547 807.52 -6184.6
## + card_drafting 1 1.411 807.65 -6184.0
## + player_elimination 1 1.174 807.89 -6182.8
## + worker_placement 1 0.712 808.35 -6180.6
## + hand_management 1 0.699 808.36 -6180.5
## + simultaneous_action_selection 1 0.652 808.41 -6180.3
## + hex.and.counter 1 0.593 808.47 -6180.0
## + stock_holding 1 0.425 808.64 -6179.2
## <none> 809.06 -6179.1
## + area_control_._area_influence 1 0.406 808.66 -6179.1
## + city_building 1 0.336 808.73 -6178.8
## + farming 1 0.316 808.75 -6178.7
## + betting.wagering 1 0.255 808.81 -6178.4
## + dice_rolling 1 0.245 808.82 -6178.3
## + economic 1 0.233 808.83 -6178.2
## + environmental 1 0.206 808.86 -6178.1
## + murder.mystery 1 0.186 808.88 -6178.0
## + line_drawing 1 0.183 808.88 -6178.0
## + fighting 1 0.159 808.90 -6177.9
## + route.network_building 1 0.155 808.91 -6177.9
## + set_collection 1 0.035 809.03 -6177.3
## + year 1 0.029 809.03 -6177.3
## + card_game 1 0.007 809.06 -6177.2
## + point_to_point_movement 1 0.005 809.06 -6177.1
## - press_your_luck 1 1.672 810.73 -6173.0
## - rock.paper.scissors 1 2.860 811.92 -6167.3
## - grid_movement 1 3.616 812.68 -6163.6
## - variable_player_powers 1 4.431 813.49 -6159.7
## - campaign_._battle_card_driven 1 5.234 814.30 -6155.8
## - multi_player 1 5.867 814.93 -6152.8
## - storytelling 1 6.313 815.37 -6150.6
## - deck_._pool_building 1 13.096 822.16 -6118.1
## - single_player 1 22.824 831.89 -6071.9
## - weight 1 275.526 1084.59 -5030.5
##
## Step: AIC=-6185.31
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement + storytelling + campaign_._battle_card_driven +
## multi_player + variable_player_powers + rock.paper.scissors +
## press_your_luck + wargame
##
## Df Sum of Sq RSS AIC
## + card_drafting 1 1.948 805.43 -6192.8
## + variable_phase_order 1 1.574 805.80 -6191.0
## + worker_placement 1 1.326 806.05 -6189.8
## + hand_management 1 1.118 806.26 -6188.8
## + player_elimination 1 1.041 806.33 -6188.4
## + economic 1 0.798 806.58 -6187.2
## + stock_holding 1 0.718 806.66 -6186.8
## + simultaneous_action_selection 1 0.588 806.79 -6186.2
## + farming 1 0.453 806.92 -6185.5
## <none> 807.38 -6185.3
## + route.network_building 1 0.407 806.97 -6185.3
## + betting.wagering 1 0.282 807.09 -6184.7
## + environmental 1 0.275 807.10 -6184.6
## + fighting 1 0.245 807.13 -6184.5
## + area_control_._area_influence 1 0.229 807.15 -6184.4
## + murder.mystery 1 0.216 807.16 -6184.4
## + line_drawing 1 0.191 807.19 -6184.2
## + set_collection 1 0.164 807.21 -6184.1
## + city_building 1 0.141 807.24 -6184.0
## + dice_rolling 1 0.034 807.34 -6183.5
## + year 1 0.026 807.35 -6183.4
## + point_to_point_movement 1 0.005 807.37 -6183.3
## + card_game 1 0.003 807.37 -6183.3
## + hex.and.counter 1 0.000 807.38 -6183.3
## - wargame 1 1.686 809.06 -6179.1
## - press_your_luck 1 1.755 809.13 -6178.8
## - rock.paper.scissors 1 2.729 810.10 -6174.1
## - campaign_._battle_card_driven 1 3.654 811.03 -6169.6
## - grid_movement 1 4.047 811.42 -6167.7
## - multi_player 1 4.853 812.23 -6163.8
## - variable_player_powers 1 4.854 812.23 -6163.8
## - storytelling 1 6.585 813.96 -6155.4
## - deck_._pool_building 1 14.097 821.47 -6119.4
## - single_player 1 21.461 828.84 -6084.3
## - weight 1 217.458 1024.83 -5251.0
##
## Step: AIC=-6192.8
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement + storytelling + campaign_._battle_card_driven +
## multi_player + variable_player_powers + rock.paper.scissors +
## press_your_luck + wargame + card_drafting
##
## Df Sum of Sq RSS AIC
## + variable_phase_order 1 1.503 803.92 -6198.1
## + worker_placement 1 1.194 804.23 -6196.6
## + player_elimination 1 1.019 804.41 -6195.8
## + stock_holding 1 0.797 804.63 -6194.7
## + economic 1 0.762 804.67 -6194.5
## + hand_management 1 0.760 804.67 -6194.5
## + simultaneous_action_selection 1 0.556 804.87 -6193.5
## + route.network_building 1 0.421 805.01 -6192.8
## <none> 805.43 -6192.8
## + farming 1 0.399 805.03 -6192.7
## + betting.wagering 1 0.344 805.08 -6192.5
## + fighting 1 0.308 805.12 -6192.3
## + area_control_._area_influence 1 0.290 805.14 -6192.2
## + city_building 1 0.242 805.19 -6192.0
## + murder.mystery 1 0.233 805.19 -6191.9
## + line_drawing 1 0.211 805.22 -6191.8
## + environmental 1 0.158 805.27 -6191.6
## + dice_rolling 1 0.061 805.37 -6191.1
## + card_game 1 0.041 805.39 -6191.0
## + set_collection 1 0.022 805.41 -6190.9
## + year 1 0.013 805.41 -6190.9
## + hex.and.counter 1 0.006 805.42 -6190.8
## + point_to_point_movement 1 0.005 805.42 -6190.8
## - press_your_luck 1 1.797 807.22 -6186.0
## - card_drafting 1 1.948 807.38 -6185.3
## - wargame 1 2.223 807.65 -6184.0
## - rock.paper.scissors 1 2.575 808.00 -6182.3
## - campaign_._battle_card_driven 1 3.643 809.07 -6177.1
## - grid_movement 1 4.379 809.81 -6173.5
## - multi_player 1 4.824 810.25 -6171.4
## - variable_player_powers 1 4.923 810.35 -6170.9
## - storytelling 1 6.693 812.12 -6162.3
## - deck_._pool_building 1 11.998 817.43 -6136.7
## - single_player 1 21.476 826.90 -6091.5
## - weight 1 213.946 1019.37 -5269.9
##
## Step: AIC=-6198.13
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement + storytelling + campaign_._battle_card_driven +
## multi_player + variable_player_powers + rock.paper.scissors +
## press_your_luck + wargame + card_drafting + variable_phase_order
##
## Df Sum of Sq RSS AIC
## + worker_placement 1 0.994 802.93 -6201.0
## + player_elimination 1 0.986 802.94 -6200.9
## + stock_holding 1 0.810 803.11 -6200.1
## + hand_management 1 0.746 803.18 -6199.8
## + economic 1 0.628 803.30 -6199.2
## + simultaneous_action_selection 1 0.503 803.42 -6198.6
## + route.network_building 1 0.428 803.50 -6198.2
## <none> 803.92 -6198.1
## + betting.wagering 1 0.361 803.56 -6197.9
## + fighting 1 0.354 803.57 -6197.9
## + area_control_._area_influence 1 0.346 803.58 -6197.8
## + city_building 1 0.343 803.58 -6197.8
## + farming 1 0.333 803.59 -6197.8
## + murder.mystery 1 0.252 803.67 -6197.4
## + line_drawing 1 0.215 803.71 -6197.2
## + environmental 1 0.182 803.74 -6197.0
## + dice_rolling 1 0.071 803.85 -6196.5
## + card_game 1 0.042 803.88 -6196.3
## + hex.and.counter 1 0.021 803.90 -6196.2
## + set_collection 1 0.011 803.91 -6196.2
## + year 1 0.010 803.91 -6196.2
## + point_to_point_movement 1 0.001 803.92 -6196.1
## - variable_phase_order 1 1.503 805.43 -6192.8
## - press_your_luck 1 1.787 805.71 -6191.4
## - card_drafting 1 1.877 805.80 -6191.0
## - wargame 1 2.242 806.17 -6189.2
## - rock.paper.scissors 1 2.573 806.50 -6187.6
## - campaign_._battle_card_driven 1 3.483 807.41 -6183.2
## - grid_movement 1 4.413 808.34 -6178.6
## - variable_player_powers 1 4.527 808.45 -6178.1
## - multi_player 1 4.979 808.90 -6175.9
## - storytelling 1 6.803 810.73 -6167.0
## - deck_._pool_building 1 12.235 816.16 -6140.8
## - single_player 1 21.730 825.65 -6095.4
## - weight 1 206.632 1010.56 -5302.1
##
## Step: AIC=-6200.99
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement + storytelling + campaign_._battle_card_driven +
## multi_player + variable_player_powers + rock.paper.scissors +
## press_your_luck + wargame + card_drafting + variable_phase_order +
## worker_placement
##
## Df Sum of Sq RSS AIC
## + player_elimination 1 1.001 801.93 -6203.9
## + stock_holding 1 0.963 801.97 -6203.7
## + hand_management 1 0.839 802.09 -6203.1
## + route.network_building 1 0.557 802.37 -6201.7
## + simultaneous_action_selection 1 0.510 802.42 -6201.5
## + economic 1 0.474 802.46 -6201.3
## + city_building 1 0.425 802.51 -6201.1
## + fighting 1 0.415 802.52 -6201.0
## <none> 802.93 -6201.0
## + area_control_._area_influence 1 0.395 802.54 -6200.9
## + betting.wagering 1 0.365 802.57 -6200.8
## + murder.mystery 1 0.280 802.65 -6200.4
## + farming 1 0.236 802.69 -6200.1
## + line_drawing 1 0.224 802.71 -6200.1
## + environmental 1 0.166 802.76 -6199.8
## + dice_rolling 1 0.073 802.86 -6199.3
## + hex.and.counter 1 0.047 802.88 -6199.2
## + card_game 1 0.017 802.91 -6199.1
## + year 1 0.006 802.92 -6199.0
## + set_collection 1 0.001 802.93 -6199.0
## + point_to_point_movement 1 0.001 802.93 -6199.0
## - worker_placement 1 0.994 803.92 -6198.1
## - variable_phase_order 1 1.303 804.23 -6196.6
## - press_your_luck 1 1.731 804.66 -6194.5
## - card_drafting 1 1.763 804.69 -6194.4
## - rock.paper.scissors 1 2.494 805.42 -6190.8
## - wargame 1 2.797 805.73 -6189.3
## - campaign_._battle_card_driven 1 3.546 806.48 -6185.7
## - grid_movement 1 4.685 807.62 -6180.1
## - variable_player_powers 1 4.714 807.64 -6180.0
## - multi_player 1 4.894 807.82 -6179.1
## - storytelling 1 6.764 809.69 -6170.1
## - deck_._pool_building 1 12.382 815.31 -6142.9
## - single_player 1 21.299 824.23 -6100.2
## - weight 1 186.807 989.74 -5381.8
##
## Step: AIC=-6203.88
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement + storytelling + campaign_._battle_card_driven +
## multi_player + variable_player_powers + rock.paper.scissors +
## press_your_luck + wargame + card_drafting + variable_phase_order +
## worker_placement + player_elimination
##
## Df Sum of Sq RSS AIC
## + stock_holding 1 0.962 800.97 -6206.6
## + hand_management 1 0.764 801.17 -6205.6
## + route.network_building 1 0.546 801.38 -6204.6
## + economic 1 0.474 801.46 -6204.2
## <none> 801.93 -6203.9
## + city_building 1 0.405 801.52 -6203.9
## + simultaneous_action_selection 1 0.405 801.52 -6203.9
## + area_control_._area_influence 1 0.396 801.53 -6203.8
## + betting.wagering 1 0.358 801.57 -6203.6
## + fighting 1 0.302 801.63 -6203.4
## + murder.mystery 1 0.272 801.66 -6203.2
## + line_drawing 1 0.239 801.69 -6203.1
## + farming 1 0.234 801.70 -6203.0
## + environmental 1 0.178 801.75 -6202.8
## + dice_rolling 1 0.090 801.84 -6202.3
## + hex.and.counter 1 0.057 801.87 -6202.2
## + card_game 1 0.038 801.89 -6202.1
## + set_collection 1 0.005 801.92 -6201.9
## + year 1 0.005 801.92 -6201.9
## + point_to_point_movement 1 0.000 801.93 -6201.9
## - player_elimination 1 1.001 802.93 -6201.0
## - worker_placement 1 1.009 802.94 -6200.9
## - variable_phase_order 1 1.271 803.20 -6199.7
## - press_your_luck 1 1.592 803.52 -6198.1
## - card_drafting 1 1.742 803.67 -6197.4
## - rock.paper.scissors 1 2.528 804.46 -6193.5
## - wargame 1 2.633 804.56 -6193.0
## - campaign_._battle_card_driven 1 3.520 805.45 -6188.7
## - variable_player_powers 1 4.169 806.10 -6185.5
## - grid_movement 1 4.523 806.45 -6183.8
## - multi_player 1 5.174 807.10 -6180.6
## - storytelling 1 6.780 808.71 -6172.8
## - deck_._pool_building 1 12.272 814.20 -6146.3
## - single_player 1 21.635 823.56 -6101.4
## - weight 1 187.800 989.73 -5379.8
##
## Step: AIC=-6206.59
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement + storytelling + campaign_._battle_card_driven +
## multi_player + variable_player_powers + rock.paper.scissors +
## press_your_luck + wargame + card_drafting + variable_phase_order +
## worker_placement + player_elimination + stock_holding
##
## Df Sum of Sq RSS AIC
## + hand_management 1 0.866 800.10 -6208.8
## + simultaneous_action_selection 1 0.442 800.53 -6206.8
## <none> 800.97 -6206.6
## + fighting 1 0.360 800.61 -6206.4
## + betting.wagering 1 0.354 800.61 -6206.3
## + city_building 1 0.352 800.62 -6206.3
## + area_control_._area_influence 1 0.347 800.62 -6206.3
## + murder.mystery 1 0.294 800.67 -6206.0
## + farming 1 0.253 800.71 -6205.8
## + line_drawing 1 0.251 800.72 -6205.8
## + route.network_building 1 0.222 800.75 -6205.7
## + environmental 1 0.198 800.77 -6205.6
## + economic 1 0.197 800.77 -6205.6
## + dice_rolling 1 0.113 800.85 -6205.1
## + hex.and.counter 1 0.084 800.88 -6205.0
## + card_game 1 0.026 800.94 -6204.7
## + set_collection 1 0.008 800.96 -6204.6
## + year 1 0.003 800.97 -6204.6
## + point_to_point_movement 1 0.000 800.97 -6204.6
## - stock_holding 1 0.962 801.93 -6203.9
## - player_elimination 1 1.000 801.97 -6203.7
## - worker_placement 1 1.162 802.13 -6202.9
## - variable_phase_order 1 1.271 802.24 -6202.4
## - press_your_luck 1 1.618 802.59 -6200.7
## - card_drafting 1 1.816 802.78 -6199.7
## - rock.paper.scissors 1 2.559 803.53 -6196.1
## - wargame 1 3.092 804.06 -6193.5
## - campaign_._battle_card_driven 1 3.545 804.51 -6191.3
## - variable_player_powers 1 4.513 805.48 -6186.5
## - grid_movement 1 4.617 805.59 -6186.0
## - multi_player 1 5.634 806.60 -6181.1
## - storytelling 1 6.872 807.84 -6175.1
## - deck_._pool_building 1 12.496 813.46 -6147.8
## - single_player 1 22.343 823.31 -6100.6
## - weight 1 170.137 971.11 -5452.4
##
## Step: AIC=-6208.84
## avg_rating ~ weight + single_player + deck_._pool_building +
## grid_movement + storytelling + campaign_._battle_card_driven +
## multi_player + variable_player_powers + rock.paper.scissors +
## press_your_luck + wargame + card_drafting + variable_phase_order +
## worker_placement + player_elimination + stock_holding + hand_management
##
## Df Sum of Sq RSS AIC
## <none> 800.10 -6208.8
## + area_control_._area_influence 1 0.378 799.72 -6208.7
## + simultaneous_action_selection 1 0.377 799.72 -6208.7
## + betting.wagering 1 0.350 799.75 -6208.6
## + fighting 1 0.347 799.76 -6208.5
## + city_building 1 0.342 799.76 -6208.5
## + murder.mystery 1 0.298 799.80 -6208.3
## + line_drawing 1 0.285 799.82 -6208.2
## + card_game 1 0.276 799.83 -6208.2
## + route.network_building 1 0.243 799.86 -6208.0
## + farming 1 0.222 799.88 -6207.9
## + economic 1 0.215 799.89 -6207.9
## + environmental 1 0.184 799.92 -6207.7
## + dice_rolling 1 0.170 799.93 -6207.7
## + hex.and.counter 1 0.125 799.98 -6207.5
## + set_collection 1 0.002 800.10 -6206.9
## + year 1 0.001 800.10 -6206.8
## + point_to_point_movement 1 0.000 800.10 -6206.8
## - hand_management 1 0.866 800.97 -6206.6
## - player_elimination 1 0.920 801.02 -6206.3
## - stock_holding 1 1.063 801.17 -6205.6
## - variable_phase_order 1 1.249 801.35 -6204.7
## - worker_placement 1 1.272 801.37 -6204.6
## - card_drafting 1 1.451 801.55 -6203.7
## - press_your_luck 1 1.730 801.83 -6202.4
## - rock.paper.scissors 1 2.612 802.71 -6198.0
## - campaign_._battle_card_driven 1 3.256 803.36 -6194.9
## - wargame 1 3.543 803.64 -6193.5
## - variable_player_powers 1 4.218 804.32 -6190.2
## - grid_movement 1 4.828 804.93 -6187.2
## - multi_player 1 5.382 805.48 -6184.5
## - storytelling 1 7.060 807.16 -6176.4
## - deck_._pool_building 1 11.656 811.76 -6154.1
## - single_player 1 22.511 822.61 -6101.9
## - weight 1 170.638 970.74 -5451.9
#finalized linear model
mod1 <- lm(avg_rating ~ weight + year + wargame + single_player +
deck_._pool_building + storytelling + grid_movement + hex.and.counter +
campaign_._battle_card_driven + multi_player + city_building +
rock.paper.scissors + simultaneous_action_selection + area_control_._area_influence +
line_drawing + betting.wagering + route.network_building +
press_your_luck + murder.mystery + fighting + card_drafting +
stock_holding + variable_player_powers + card_game + point_to_point_movement,
data = train_set)
summary(mod1)
##
## Call:
## lm(formula = avg_rating ~ weight + year + wargame + single_player +
## deck_._pool_building + storytelling + grid_movement + hex.and.counter +
## campaign_._battle_card_driven + multi_player + city_building +
## rock.paper.scissors + simultaneous_action_selection + area_control_._area_influence +
## line_drawing + betting.wagering + route.network_building +
## press_your_luck + murder.mystery + fighting + card_drafting +
## stock_holding + variable_player_powers + card_game + point_to_point_movement,
## data = train_set)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.45454 -0.30833 -0.01586 0.29416 1.74102
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.947e+00 1.174e-01 50.643 < 2e-16 ***
## weight 3.302e-01 1.168e-02 28.267 < 2e-16 ***
## year 2.129e-05 5.706e-05 0.373 0.709146
## wargame 7.426e-02 2.706e-02 2.744 0.006097 **
## single_player 2.380e-01 2.285e-02 10.415 < 2e-16 ***
## deck_._pool_building 2.606e-01 3.437e-02 7.581 4.25e-14 ***
## storytelling 3.459e-01 6.216e-02 5.564 2.81e-08 ***
## grid_movement 1.393e-01 3.146e-02 4.428 9.77e-06 ***
## hex.and.counter 6.086e-03 3.368e-02 0.181 0.856636
## campaign_._battle_card_driven 1.747e-01 4.106e-02 4.254 2.15e-05 ***
## multi_player 8.879e-02 1.661e-02 5.346 9.51e-08 ***
## city_building -2.900e-02 3.532e-02 -0.821 0.411666
## rock.paper.scissors -3.107e-01 8.173e-02 -3.802 0.000146 ***
## simultaneous_action_selection 5.108e-02 2.778e-02 1.838 0.066090 .
## area_control_._area_influence -2.345e-02 2.341e-02 -1.002 0.316491
## line_drawing 1.154e-01 1.078e-01 1.071 0.284283
## betting.wagering 7.809e-02 5.507e-02 1.418 0.156286
## route.network_building 3.626e-02 3.664e-02 0.989 0.322534
## press_your_luck 1.164e-01 3.935e-02 2.958 0.003110 **
## murder.mystery 7.515e-02 6.378e-02 1.178 0.238788
## fighting 3.617e-02 2.637e-02 1.371 0.170311
## card_drafting 7.951e-02 2.371e-02 3.353 0.000806 ***
## stock_holding 8.481e-02 5.239e-02 1.619 0.105596
## variable_player_powers 9.357e-02 2.186e-02 4.280 1.92e-05 ***
## card_game -6.678e-03 1.835e-02 -0.364 0.715951
## point_to_point_movement -7.111e-03 3.225e-02 -0.221 0.825465
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4535 on 3900 degrees of freedom
## Multiple R-squared: 0.3506, Adjusted R-squared: 0.3464
## F-statistic: 84.22 on 25 and 3900 DF, p-value: < 2.2e-16
#predictions vs actual rating
predictions <- predict(mod1, test_set)
cor(predictions, test_set$avg_rating)
## [1] 0.6160393
#graph of predictions vs actual rating
data.frame(test_set$avg_rating, predictions) %>% ggplot(aes(test_set$avg_rating, predictions)) +
geom_point(color = "pink") +
stat_ellipse(color = "pink") +
xlab("Actual Average Rating") +
ylab("Predicted Average Rating") +
ggtitle("Actual vs Predicted Average Rating")
#graph of linear model coefficients by magnitude
mod_coef <- data.frame(mod1$coeff)
mod_coef <- mod_coef %>%
mutate(variable = rownames(mod_coef))
ggplot(mod_coef[-1,], aes( x=reorder(variable, mod1.coeff), y=mod1.coeff, fill=mod1.coeff))+
geom_bar(stat="identity") +
coord_flip() +
labs(title = "Final Linear Model Coefficients",y="Coefficients",x="Variable")+
scale_fill_gradient(low="grey50", high="grey50")+
theme_light() +
guides(fill=guide_legend(title="Coefficient values"))
# train knn
knnFit <- train(avg_rating ~.,
data = train_set,
method = "knn",
na.action=na.exclude,
trControl = control,
preProcess = c("center", "scale"),
tuneLength = 10)
knnFit
## k-Nearest Neighbors
##
## 3926 samples
## 151 predictor
##
## Pre-processing: centered (151), scaled (151)
## Resampling: Cross-Validated (20 fold)
## Summary of sample sizes: 3730, 3729, 3730, 3730, 3730, 3729, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 0.4825633 0.2887958 0.3764107
## 7 0.4760223 0.2996555 0.3733424
## 9 0.4737251 0.3037911 0.3736822
## 11 0.4717253 0.3106384 0.3725916
## 13 0.4700448 0.3159743 0.3715325
## 15 0.4695079 0.3175796 0.3715943
## 17 0.4687577 0.3210255 0.3712425
## 19 0.4694049 0.3205389 0.3716843
## 21 0.4682217 0.3259626 0.3703395
## 23 0.4686133 0.3268853 0.3703274
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 21.
#plot it
knnFit %>% ggplot() +
geom_point(color='pink') +
geom_line(color ='pink')
#prediction
knnPredict <- predict(knnFit,newdata = test_set) # 932
#plot actual vs prediction
data <- data.frame(cbind(test_set$avg_rating,knnPredict))
names(data) <- c('avg_rating','knnPredict')
data %>% ggplot(aes(avg_rating, knnPredict)) +
geom_point(color = 'pink') +
xlim(c(6,8)) +
ylim(c(6,8)) +
ylab('KNN_prediction') +
xlab('Actual average_rating')
## Warning: Removed 54 rows containing missing values (geom_point).
# select needed columns
train_set_select <- train_set[, c(2:8, 11, 13:152)]
# run randomForest with all features
fit <- randomForest(avg_rating ~ .,
data = train_set_select,
ntree = 500)
# plot feature importance
impt <- as.data.frame(importance(fit))
impt$variable <- names(train_set_select[, -8])
impt <- transform(impt, variable = reorder(variable, IncNodePurity))
impt %>%
mutate(sort(IncNodePurity, decreasing = TRUE)) %>%
ggplot() +
geom_bar(aes(y = IncNodePurity, x = variable), stat = 'identity') +
coord_flip() +
ylab("Feature Importance") +
xlab("Feature") +
ggtitle("Board Game Features Ranked by Importance") +
theme(text = element_text(size=2))
Top 20 Most Important Features
# select top 20 most important features
impt_feature <- impt %>%
mutate(sort(IncNodePurity, decreasing = TRUE)) %>%
head(20) %>%
select(variable)
impt_feature <- as.character(as.vector(impt_feature$variable))
impt_feature_df <- impt %>%
filter(variable %in% impt_feature)
impt_feature_df %>%
mutate(sort(IncNodePurity, decreasing = TRUE)) %>%
ggplot() +
geom_bar(aes(y = IncNodePurity, x = variable), stat = 'identity') +
coord_flip() +
ylab("Feature Importance") +
xlab("Feature") +
ggtitle("Board Game Features Ranked by Importance")
# fit the random forest model
# change number of variables randomly sampled as candidates at each split
RMSE_mtry <- c()
for (m in 1:30) {
fit <- randomForest(avg_rating ~ .,
data = train_set_select,
ntree = 100,
mtry = m)
# make predictions on test set
predictions <- predict(fit, test_set)
# calculate RMSE
RMSE <- sqrt(sum((predictions - test_set$avg_rating)^2)/length(predictions))
print(RMSE)
RMSE_mtry <- c(RMSE_mtry, RMSE)
}
## [1] 0.4962323
## [1] 0.4414347
## [1] 0.4199963
## [1] 0.4072601
## [1] 0.3996708
## [1] 0.3900891
## [1] 0.3839525
## [1] 0.3820635
## [1] 0.3807499
## [1] 0.3791024
## [1] 0.3781362
## [1] 0.3766863
## [1] 0.3740838
## [1] 0.3728372
## [1] 0.3704151
## [1] 0.3693029
## [1] 0.3713737
## [1] 0.3705476
## [1] 0.3703653
## [1] 0.3715319
## [1] 0.3688442
## [1] 0.3695915
## [1] 0.3713514
## [1] 0.3704681
## [1] 0.3694058
## [1] 0.3683989
## [1] 0.3683301
## [1] 0.369796
## [1] 0.371918
## [1] 0.3686873
# get minimum RMSE
which.min(RMSE_mtry)
## [1] 27
# plot RMSE with mtry
ggplot() +
geom_line(aes(x = 1:30, y = RMSE_mtry)) +
geom_point(aes(x = 1:30, y = RMSE_mtry)) +
ggtitle("Choose the Best Number of Variables to Include at Each Split") +
xlab("mtry") +
ylab("RMSE")
# change number of trees to grow
RMSE_ntree <- c()
for (n in 1:10) {
fit <- randomForest(avg_rating ~ .,
data = train_set_select,
ntree = n * 50,
mtry = which(RMSE_mtry == min(RMSE_mtry)))
# make predictions on test set
predictions <- predict(fit, test_set)
# calculate RMSE
RMSE <- sqrt(sum((predictions - test_set$avg_rating)^2)/length(predictions))
print(RMSE)
RMSE_ntree <- c(RMSE_ntree, RMSE)
}
## [1] 0.371073
## [1] 0.368581
## [1] 0.3701315
## [1] 0.3686847
## [1] 0.3681581
## [1] 0.3694296
## [1] 0.3682721
## [1] 0.3678488
## [1] 0.3671117
## [1] 0.3682236
ggplot() +
geom_line(aes(x = seq(50, 500, 50), y = RMSE_ntree)) +
geom_point(aes(x = seq(50, 500, 50), y = RMSE_ntree)) +
ggtitle("Choose the Best Number of Trees to Grow") +
xlab("ntree") +
ylab("RMSE")
which.min(RMSE_ntree)
## [1] 9
min(RMSE_ntree)
## [1] 0.3671117
Fit a model using our best mtry and ntree.
# best model
fit <- randomForest(avg_rating ~ .,
data = train_set_select,
ntree = which.min(RMSE_ntree) * 50,
mtry = which.min(RMSE_mtry))
# make predictions
predictions <- predict(fit, test_set)
# calculate new RMSE
RMSE <- sqrt(sum((predictions - test_set$avg_rating)^2)/length(predictions))
print(RMSE)
## [1] 0.3683742
# R^2
R_2 <- 1 - sum((test_set$avg_rating - predictions)^2) / sum((test_set$avg_rating - mean(test_set$avg_rating))^2)
R_2
## [1] 0.5606712
# plot
ggplot() +
geom_point(aes(x = test_set$avg_rating, y = predictions), col = "pink") +
ggtitle("Predictions vs True Average Ratings") +
xlab("True Average Rating") +
ylab("Predicted Average Rating")
### support vector machine
svmFit <- train(avg_rating ~.,
data = train_set,
method = "svmLinear",
na.action=na.exclude,
trControl = control,
preProcess = c("center", "scale"),
tuneLength = 10)
##
## Attaching package: 'kernlab'
## The following object is masked from 'package:purrr':
##
## cross
## The following object is masked from 'package:ggplot2':
##
## alpha
svmFit
## Support Vector Machines with Linear Kernel
##
## 3926 samples
## 151 predictor
##
## Pre-processing: centered (151), scaled (151)
## Resampling: Cross-Validated (20 fold)
## Summary of sample sizes: 3729, 3729, 3730, 3730, 3730, 3730, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 0.3929209 0.5224474 0.2931945
##
## Tuning parameter 'C' was held constant at a value of 1
svmPredict <- predict(svmFit,newdata = test_set)
#plot actual vs prediction
data <- data.frame(cbind(svmPredict, test_set$avg_rating))
names(data) <- c('svmPredict','avg_rating')
data %>% ggplot(aes(avg_rating, svmPredict)) +
geom_point(color = 'hotpink2') +
xlim(c(6,8)) +
ylim(c(6,8)) +
xlab('Actual average_rating') +
ylab('SVM_prediction')
## Warning: Removed 67 rows containing missing values (geom_point).
We also built a simple board game recommender where a user can input their favorite board game and we will make recommendations of several board games that we think they might like, based on how similar the games are to their favorite game.
# import data
df <- read.csv("df_recode_final_1127", sep = "|")
# only keep data on or after 1980
df1 <- df %>% filter(year >= 1980)
# omit na
df1_na_omit <- na.omit(df1)
# drop columns not wanted
drops <- c("rank", "bgg_url", "game_id", "image_url", "mechanic", "category", "designer")
df_rec <- df1_na_omit[ , !(names(df1_na_omit) %in% drops)]
#Recommender function
# function to get similarity between two boardgames using Euclidean distance
get_most_simi <- function(game_name, df) {
# get only the mechanics and category columns
df_new <- df[, c(1, 20:153)]
#df_new <- df
if (game_name %in% df_new$names) {
print("Yay your game is found!")
# create a vector of the features of the user's favorite game
game_played <- as.numeric(as.vector(df_new[df_new$names == game_name, ]))[-1]
score <- numeric(0)
for (i in 1:dim(df_new)[1]) {
score <- c(score,
dist(list(game_played, as.numeric(df_new[i, -1])), method = "Euclidean"))
}
names(score) <- df_new[, 1]
games <- names(score)
score <- as.data.frame(score)
score$game <- games
score <- score[order(score$score), ]
similar_games <- score %>%
filter(score < quantile(score, 0.02) &
score != 0)
game_list <- df %>%
filter(names %in% similar_games$game)
recommendations <- game_list[order(game_list$geek_rating, decreasing = TRUE), ] %>%
select(names) %>%
head(10)
return(recommendations)
}
else
print("Loading should only take a few seconds! If no games appear, please try another :)")
}
# example
# input: Kingdom Builder
get_most_simi("Kingdom Builder", df_rec)
## [1] "Yay your game is found!"
## names
## 1 Carcassonne
## 2 Web of Power
## 3 Carcassonne: The Castle
## 4 Domaine
## 5 Gold West
## 6 Rattus
## 7 L<f6>wenherz
## 8 Fjords
## 9 Barony
## 10 Guilds of London
We built a board game recommender Shiny app where user can input the name of a board game and we can output a table of 10 board games recommended.
#shiny app
ui <- fluidPage(
sidebarLayout(
sidebarPanel(
# add a title
titlePanel("Board Game Recommender"),
textInput("text", label = h3("Game"), value = "Name a game :)" )),
mainPanel(
titlePanel("Recommended Games"),
tableOutput("table"))
)
)
server <- function(input, output) {
output$value <- renderText({ input$text})
output$table <- renderTable({
recommendations <- get_most_simi(input$text, df_rec)
})
}
shinyApp(ui=ui,server=server)
The exploratory data analysis and machine learning were used to analyze the board game dataset. From the EDA, we explored the relationships between different characteristics of a board game to the average game rating, and we built models to predict the rating based on possible predictors. Here are some interesting findings. The average rating is different from the geek rating by both categories and mechanics across years. We found more similarity in preferred mechanics than in preferred categories for each age group and each player group. The top rated categories were card game, economics, flighting and fantasy. Card game and fantasy theme were gradually taking over the market, but the war game lost popularity over the years. The top rated mechanics were variable player powers, dice rolling, hand management, and card drafting. Hand management became more popular over the years. Also, the longer the players spent in a game, the more likely they would highly rate this game.
The four machine learning methods were used to formally assess the association between characteristics of the board games to the average rating. The linear regression based on the stepwise selection by AIC was applied. A final model had 24 significant predictors and RMSE = 0.42. By the regression model, game difficulty most strongly influenced the average rating of a game. Of all significant categories, storytelling was most strongly associated with higher average ratings. Of all significant mechanics, war games were most strongly associated with higher average ratings. Games categorized as rock-paper-scissors games were the most strongly associated with lower average ratings. The kth nearest neighbor using 20 fold cross validation was applied. The optimal k is 23 with RMSE = 0.46. The random forest with ntree = 550 and mtry = 30 was applied to select the top important features. Based on the random forest analysis, game difficulty, year, maximum and minimum time needed to play, age requirement, and player groups were the most important features. The support vector machine using 20 fold cross validation was applied. It resulted the lowest RMSE = 0.35, which means our best model were on average 0.35 points off from the true rating. There were several board game categories and mechanics that tend to do better than others. To maximize the chances of a board game becoming popular, a game designer could attempt making a game in these top categories or using some popular mechanics.
Based on our prediction, a board game recommender was built. Six relevant board games were recommended for a board game based on Euclidean distance.